Post-Training and Efficient Learning

This page tracks prompt evolution, reinforcement learning, distillation, LoRA, quantization-aware training, and compute-as-teacher methods.

Sources in this batch

GEPA claims reflective prompt evolution can outperform reinforcement learning.
Evolution strategies at hyperscale and FP8 reinforcement learning provide optimization-scale context.
LoRA Without Regret and on-policy distillation target efficient adaptation.
Compute-as-teacher and compute-optimal quantization-aware training connect inference compute and training objectives.

Research interest

The surprising cluster is that many “training” improvements now operate around the model: prompt evolution, distillation, LoRA, RL precision formats, and inference-compute-derived supervision. This suggests a broader post-training design space where the boundary between optimizer, prompt, adapter, and runtime is fluid.

Open questions:

When does prompt/program evolution outperform gradient-based RL, and why?
Can on-policy distillation preserve exploration benefits without serving-time cost?
How should quantization-aware training be optimized for reasoning and tool-use rather than perplexity?

Quartz 5

Explorer

Post-Training and Efficient Learning

Post-Training and Efficient Learning

Sources in this batch

Research interest

Graph View

Table of Contents

Backlinks

Quartz 5

Explorer

Post-Training and Efficient Learning

Post-Training and Efficient Learning

Sources in this batch

Research interest

Related

Graph View

Table of Contents

Backlinks