Overview

This presentation covers reward-design issues in RLVR and efficient reasoning approaches with tiny parameter adaptation.

Papers

  1. Spurious Rewards - Rethinking Training Signals in RLVR
  2. Learning toReasonin13Parameters

Slides