Open Models and Gemma 4

This page tracks open and locally runnable model releases, especially Gemma 4, Qwen coder variants, Step models, Bonsai, and browser/runtime tooling like Transformers.js.

Sources in this batch

LessWrong asks how far behind open models are.
Google and Maarten Grootendorst provide Gemma 4 overviews, including multi-token prediction drafters.
AMD documents day-0 Gemma 4 support on AMD processors and GPUs.
Hugging Face hosts Step-3.7-Flash-GGUF and Qwen3-Coder-Next-GGUF variants.
Transformers.js 4.0.0 and Bonsai indicate continued expansion of model deployment surfaces.
A hardware-design video asks how hardware is reshaping LLM design.

Research interest

The surprising angle is the convergence of open weights, local runtimes, and architecture/runtime co-design. Gemma 4’s multi-token prediction drafters are notable because they attack inference speed architecturally, not only through kernels or quantization. The open-model-gap question should be tracked empirically: gaps may differ across reasoning, coding, multimodal tasks, latency, and local deployability.

Open questions:

Does multi-token prediction materially change the serving economics for small and mid-size models?
Are open models behind mainly in post-training/evals, data, architecture, or deployment polish?
Which model families are actually useful under local hardware constraints rather than only on leaderboards?

Quartz 5

Explorer

Open Models and Gemma 4

Open Models and Gemma 4

Sources in this batch

Research interest

Graph View

Table of Contents

Backlinks

Quartz 5

Explorer

Open Models and Gemma 4

Open Models and Gemma 4

Sources in this batch

Research interest

Related

Graph View

Table of Contents

Backlinks