Andrew Barto and Richard Sutton are the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. In a series of papers beginning in the 1980s, Barto and Sutton introduced the main ideas, constructed the mathematical foundations...
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making, covering value-based method, policy-gradient methods, model-based methods, and various other topics (e.g., multi-agent RL, RL+LLMs, and RL+inference).
Link Actions
An overview of RL published just a few days ago. 144 pages of goodies covering everything from basic RL theory to modern deep RL algorithms and various related niches.
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making, covering value-based RL, policy-gradient methods, model-based methods, and various other topics (including a very brief discussion of RL+LLMs).
OpenAI just put out a blog post about a new model trained via RL (I'm assuming this isn't the usual RLHF) to perform chain of thought reasoning before giving the user its answer. As usual, there's very little detail about how this is accomplished so it's hard for me to get excited about it, but the rest of you might find this interesting.