World Model Design

The Dreamer 4 world model predicts the next state, reward, and continuation flag conditioned on action history.

World Model Design

Core components:

  • Temporal Transformer with 12–24 layers.

  • Latent dimension: 512–1024 depending on compute.

  • Uses categorical latents with 1% unimix for KL stability.

  • Shortcut forcing aligns long-term rollouts with physical realism.

Training losses:

  • Symlog reconstruction

  • KL with free bits = 1 nat

  • Reward prediction via two-hot bins

  • Continuation prediction via logistic head

Last updated