Open Models and Gemma 4
This page tracks open and locally runnable model releases, especially Gemma 4, Qwen coder variants, Step models, Bonsai, and browser/runtime tooling like Transformers.js.
Sources in this batch
- LessWrong asks how far behind open models are.
- Google and Maarten Grootendorst provide Gemma 4 overviews, including multi-token prediction drafters.
- AMD documents day-0 Gemma 4 support on AMD processors and GPUs.
- Hugging Face hosts Step-3.7-Flash-GGUF and Qwen3-Coder-Next-GGUF variants.
- Transformers.js 4.0.0 and Bonsai indicate continued expansion of model deployment surfaces.
- A hardware-design video asks how hardware is reshaping LLM design.
Research interest
The surprising angle is the convergence of open weights, local runtimes, and architecture/runtime co-design. Gemma 4’s multi-token prediction drafters are notable because they attack inference speed architecturally, not only through kernels or quantization. The open-model-gap question should be tracked empirically: gaps may differ across reasoning, coding, multimodal tasks, latency, and local deployability.
Open questions:
- Does multi-token prediction materially change the serving economics for small and mid-size models?
- Are open models behind mainly in post-training/evals, data, architecture, or deployment polish?
- Which model families are actually useful under local hardware constraints rather than only on leaderboards?