Open Model Leaderboards and Ecosystem

This page tracks open-weight model rankings, state-of-AI usage data, and model-provider ecosystem changes.

Sources in this batch

  • Onyx’s self-hosted LLM leaderboard and LM Council benchmarks rank open and proprietary systems.
  • OpenRouter’s State of AI provides usage-scale perspective.
  • GPT-OSS evaluation, Mistral 3, Arcee, OLMo, GLM-4.6-GGUF, ModernVBERT, and Qwen updates represent ecosystem movement.

Research interest

The interesting research problem is measurement: open-model progress is multi-dimensional, and leaderboards often collapse it into a single score. For local/self-hosted work, deployment constraints, context length, tool use, multimodality, license, and quantization support may matter more than benchmark rank.

Open questions:

  • What benchmark mix best predicts usefulness for agentic research workflows?
  • How do open-weight models compare after controlling for tool scaffolding and inference budget?
  • Are usage studies a better signal of practical value than leaderboards?