Spurious Rewards and TinyLoRA
Overview
This presentation covers reward-design issues in RLVR and efficient reasoning approaches with tiny parameter adaptation.
Papers
- Spurious Rewards - Rethinking Training Signals in RLVR
- Learning toReasonin13Parameters