202509292207
Status: #idea
Tags: #reinforcement_learning #ai
# Large neural networks can help solve the non-stationarity problem
In value-base reinforcement learning, the targets are non-stationary. ([[The target moves in value-based deep reinforcement learning]])
We saw one mitigation to this in [[Deep Q-Network (DQN)]], where a frozen version of the deep learning value function is used as the target network while we update the live version of the deep learning value function.
Another way to mitigate non-stationarity is to use a sufficiently large & expressive neural network. A larger neural network will be more easily able to distinguish subtle differences between states. This means that, when we update the network for a specific (state, action) pair, it is less likely to perturb the values of similar (state, action) pairs represented by the function.
---
# References
[[Grokking Deep Reinforcement Learning]]