Implementation Notes

For researchers, Dreamer 4 serves as both an algorithmic template and an engineering benchmark for scalable offline RL via world modeling.

  • Entropy Control: Maintain moderate policy entropy (≈ 3 × 10⁻⁴) to prevent exploitation of world-model imperfections.

  • Rollout Length: Start short (8–16 steps), expand as model fidelity stabilizes to 64–128 steps.

  • Replay Ratio: Higher ratios (16–32) improve data efficiency; tune compute budget accordingly.

  • Free Bits & KL Balancing: Continue using Dreamer V3 defaults to stabilize latent distributions.

  • Real-Time Profiling: World model should maintain sub-frame rollout latency; otherwise imagination RL slows dramatically.

  • Evaluation Cycle: Alternate between imagination updates and brief real-env validations if applicable.

  • Action Grounding Coverage: Ensure labeled data captures full action diversity (movement, crafting, camera, object interaction) to avoid bias in control dynamics.

  • Model Checkpointing: Save both world-model and policy weights jointly to preserve latent–policy consistency.

Last updated