Implementation Notes
For researchers, Dreamer 4 serves as both an algorithmic template and an engineering benchmark for scalable offline RL via world modeling.
Entropy Control: Maintain moderate policy entropy (≈ 3 × 10⁻⁴) to prevent exploitation of world-model imperfections.
Rollout Length: Start short (8–16 steps), expand as model fidelity stabilizes to 64–128 steps.
Replay Ratio: Higher ratios (16–32) improve data efficiency; tune compute budget accordingly.
Free Bits & KL Balancing: Continue using Dreamer V3 defaults to stabilize latent distributions.
Real-Time Profiling: World model should maintain sub-frame rollout latency; otherwise imagination RL slows dramatically.
Evaluation Cycle: Alternate between imagination updates and brief real-env validations if applicable.
Action Grounding Coverage: Ensure labeled data captures full action diversity (movement, crafting, camera, object interaction) to avoid bias in control dynamics.
Model Checkpointing: Save both world-model and policy weights jointly to preserve latent–policy consistency.
Last updated
