Deep Reinforcement Learning: Model Based Reinforcement Learning

less than 1 minute read


graph TD; id1[Time of Planning]-->id2[Decision Time
Planning]; id1-->id3[Background
Planning]; id2-->id4[Continuous
Actions]; id2-->id5[Discrete
Actions]; id4-->id6[Shooting]; id4-->id7[Collocation]; id3-->id8[Simulate
Environment]; id3-->id9[Assist Learning
Algorithm]; id6-->id10[iLQR
DDP]:::methods; id7-->id11[Direct collocation
STOMP]:::methods; id5-->id12[Heuristic search
MCTS]:::methods; id8-->id13[DYNA
MBPO]:::methods; id9-->id14[Policy backprop
Dreamer]:::methods; classDef methods fill:#f96;

Optimal Control and Planning

What if we knew the transition dynamics

Often we do know the dynamics

  1. Games (e.g. Go)
  2. Easily modeled systems (e.g., navigating a car)
  3. Simulated environments (e.g, simulated robots, video games) Often we learn the dynamics
  4. System identification - fit unknown parameters of a known model
  5. Learning - fit a general purpose model to observed transition data

Model-based reinforcement learning Model-based reinforcement learning: learn the transition dynamics, then figure out how to choose actions