Actor–Critic Framework
Actor–Critic Framework
Dreamer 4 employs an imagination-based actor–critic:
The actor samples actions from the world model.
The critic learns value distributions across imagined rollouts.
Training Objective: [ L_{actor} = -E[\hat{R}_\tau / S] + \eta H(\pi(a|s)) ]
where (S) normalizes returns, and (H) is policy entropy (~3×10⁻⁴ scale).
Optimization:
Adaptive gradient clipping (30%)
LaProp optimizer
Replay ratio 16–32
Last updated
