Reinforcement Learning
Reinforcement Learning (RL) is a paradigm where agents learn optimal behavior through trial and error interactions with an environment. RL has achieved superhuman performance in games and is now fundamental to training modern LLMs through RLHF.1
Topics in This Section
| Topic | Description |
|---|---|
| RL Fundamentals | Agents, environments, rewards, policies |
| MDPs & Bellman Equations | Mathematical framework for RL |
| Q-Learning | Value-based learning methods |
| Policy Gradient Methods | Direct policy optimization |
| Actor-Critic | Combining value and policy methods |
| RLHF | Reinforcement Learning from Human Feedback for LLMs |
Learning Path
RL Fundamentals → MDPs & Bellman → Q-Learning → Policy Gradients → Actor-Critic → RLHF
Related Domains
- Need deep learning basics? See 03 - Deep Learning
- Applying RLHF to LLMs? See 05 - Generative AI
- Math foundations? See 08 - Algorithms & Math
References
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. http://incompleteideas.net/book/the-book-2nd.html ↩︎