Overview

This presentation summarizes DeepSeek-R1 and DAPO, focusing on optimization objectives, training behavior, and practical implications for reasoning performance.

Papers

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Slides

Download PDF