202409231726
Status: #idea
Tags: #ai
# Reinforcement learning is supervised learning with uncertainty
In supervised learning, our goal is to learn a model that predict, given an input $x_i$, the correct output $y_i$
In reinforcement learning, our goal is to learn policy that predicts, given an input state $S_t$, the optimal action $A_t$ that maximizes our long run reward.
In RL set ups, we typically only have a desired "end state" in mind (e.g. win this game of chess), and so we don't know a priori which move in each state is optimal. So we instead play out a game and back propagate credit for losses/wins to the actions taken in each state along the way.
But if we had a massive dataset of states & action pairs where we *knew* the optimal action to be taken in that state to maximize reward? Then we could formulate this is as a supervised learning problem, learning a function that maps states to actions via standardized supervised learning rather than RL techniques.
---
# References