202509282157 Status: #idea Tags: #reinforcement_learning #ai # Monte Carlo Control Monte Carlo Control is quite simple conceptually using the frameworks we've built up. It fits neatly into the [[Generalized policy iteration (GPI)]] framework: 1. Policy Improvement - done using $\epsilon$-greedy strategies 2. Policy Evaluation - done using first-visit Monte Carlo We alternate between Policy Improvement and Policy Evaluation after every full trajectory. That is, we alternate a single MC-prediction step and a single decaying epsilon-greedy action-selection improvement step. --- # References