World Model Design
The Dreamer 4 world model predicts the next state, reward, and continuation flag conditioned on action history.
World Model Design
Core components:
Temporal Transformer with 12–24 layers.
Latent dimension: 512–1024 depending on compute.
Uses categorical latents with 1% unimix for KL stability.
Shortcut forcing aligns long-term rollouts with physical realism.
Training losses:
Symlog reconstruction
KL with free bits = 1 nat
Reward prediction via two-hot bins
Continuation prediction via logistic head
Last updated
