202509282010 Status: #idea Tags: #reinforcement_learning #ai # Generalized policy iteration (GPI) ![[Pasted image 20250928201121.png]] Generalized policy iteration (GPI) combines two core ideas in RL: 1. Policy evaluation - where we improve the value function to be consistent with the current policy 2. Policy improvement - where we make the policy greedy with respect to the current value function (i.e., in each state, choose the action that produces the largest return according to the current value function) GPI has these two processes interact, with the value function optimized with respect to the policy, and then a new policy derived from the updated value function. --- # References [[Grokking Deep Reinforcement Learning]] [[The core mental model of reinforcement learning]]