202509282157
Status: #idea
Tags: #reinforcement_learning #ai
# Monte Carlo Control
Monte Carlo Control is quite simple conceptually using the frameworks we've built up. It fits neatly into the [[Generalized policy iteration (GPI)]] framework:
1. Policy Improvement - done using $\epsilon$-greedy strategies
2. Policy Evaluation - done using first-visit Monte Carlo
We alternate between Policy Improvement and Policy Evaluation after every full trajectory. That is, we alternate a single MC-prediction step and a single decaying epsilon-greedy action-selection improvement step.
---
# References