Small Reasoning Models

Small reasoning models aim to achieve strong reasoning, coding, or math performance with relatively compact parameter counts.

Sources in this batch

  • VibeThinker-3B is a technical report on a 3B dense model for verifiable reasoning, using curriculum SFT, multi-domain RL, and offline self-distillation. The abstract reports strong AIME26 and LiveCodeBench results.
  • Maxime Labonne’s talk, Everything I Learned Training Frontier Small Models, is a relevant video source for practical small-model training lessons.

Research interest

The surprising claim is that a 3B model can approach first-tier reasoning-system performance on some verifiable tasks when trained with curriculum SFT, RL, self-distillation, and test-time scaling. The research question is whether this reflects a general recipe for compressing reasoning ability or a benchmark/task-specific effect. This page should track replications, ablations, and evidence of out-of-distribution robustness.

Why it matters

Small reasoning models are important for local inference, lower serving cost, privacy-sensitive deployments, and fast iteration. They connect model training choices to local-ai-benchmarks and production constraints.