Post-Training and Efficient Learning
This page tracks prompt evolution, reinforcement learning, distillation, LoRA, quantization-aware training, and compute-as-teacher methods.
Sources in this batch
- GEPA claims reflective prompt evolution can outperform reinforcement learning.
- Evolution strategies at hyperscale and FP8 reinforcement learning provide optimization-scale context.
- LoRA Without Regret and on-policy distillation target efficient adaptation.
- Compute-as-teacher and compute-optimal quantization-aware training connect inference compute and training objectives.
Research interest
The surprising cluster is that many “training” improvements now operate around the model: prompt evolution, distillation, LoRA, RL precision formats, and inference-compute-derived supervision. This suggests a broader post-training design space where the boundary between optimizer, prompt, adapter, and runtime is fluid.
Open questions:
- When does prompt/program evolution outperform gradient-based RL, and why?
- Can on-policy distillation preserve exploration benefits without serving-time cost?
- How should quantization-aware training be optimized for reasoning and tool-use rather than perplexity?