Overview

This presentation summarizes DeepSeek-R1 and DAPO, focusing on optimization objectives, training behavior, and practical implications for reasoning performance.

Paper 1: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

Paper 2: Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Slides