DeepSeek R1 and DAPO
Overview
This presentation summarizes DeepSeek-R1 and DAPO, focusing on optimization objectives, training behavior, and practical implications for reasoning performance.
Paper 1: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Paper 2: Orca: Progressive Learning from Complex Explanation Traces of GPT-4