Comparison to Dreamer V3
Dreamer 4 extends Dreamer 3’s philosophy of domain-invariant learning into the offline imagination era, unifying perception, dynamics, and control within a single scalable framework.
Training Mode
Online RL + Imagination
Offline → Imagination-only RL
World Model
Recurrent RSSM + Block GRU
Transformer-based latent dynamics + Shortcut Forcing
Data Source
Environment interaction
Unlabeled videos + small labeled subset
Generalization
Fixed hypers across 150 tasks
Cross-domain, cross-modality robustness from offline data
Compute Scaling
Improves with model size & replay ratio
Linear scaling to real-time world simulation
Signature Demo
Diamonds in Minecraft (from scratch)
Diamonds in Minecraft from offline data only
Core Losses
Symlog, Two-Hot, KL balancing + Free Bits
Adds Shortcut Forcing + Object-Interaction Fidelity terms
Goal
General RL without retuning
General AI agent training without real world steps
Last updated
