Actor–Critic Framework

Actor–Critic Framework

Dreamer 4 employs an imagination-based actor–critic:

  • The actor samples actions from the world model.

  • The critic learns value distributions across imagined rollouts.

Training Objective: [ L_{actor} = -E[\hat{R}_\tau / S] + \eta H(\pi(a|s)) ]

where (S) normalizes returns, and (H) is policy entropy (~3×10⁻⁴ scale).

Optimization:

  • Adaptive gradient clipping (30%)

  • LaProp optimizer

  • Replay ratio 16–32

Last updated