Reinforcement Learning

Reinforcement Learning (RL) is a paradigm where agents learn optimal behavior through trial and error interactions with an environment. RL has achieved superhuman performance in games and is now fundamental to training modern LLMs through RLHF.1


Topics in This Section

TopicDescription
RL FundamentalsAgents, environments, rewards, policies
MDPs & Bellman EquationsMathematical framework for RL
Q-LearningValue-based learning methods
Policy Gradient MethodsDirect policy optimization
Actor-CriticCombining value and policy methods
RLHFReinforcement Learning from Human Feedback for LLMs

Learning Path

RL Fundamentals → MDPs & Bellman → Q-Learning → Policy Gradients → Actor-Critic → RLHF

  • Need deep learning basics? See 03 - Deep Learning
  • Applying RLHF to LLMs? See 05 - Generative AI
  • Math foundations? See 08 - Algorithms & Math

References


  1. Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. http://incompleteideas.net/book/the-book-2nd.html ↩︎