Spurious Rewards and TinyLoRA | Sunwoo Bae

Overview

This presentation covers reward-design issues in RLVR and efficient reasoning approaches with tiny parameter adaptation.

Papers

Spurious Rewards - Rethinking Training Signals in RLVR
Learning toReasonin13Parameters

Slides

Download PDF