DeepSeek R1 and DAPO
Overview
This presentation summarizes DeepSeek-R1 and DAPO, focusing on optimization objectives, training behavior, and practical implications for reasoning performance.
Papers
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- Orca: Progressive Learning from Complex Explanation Traces of GPT-4