Open Model Leaderboards and Ecosystem
This page tracks open-weight model rankings, state-of-AI usage data, and model-provider ecosystem changes.
Sources in this batch
- Onyx’s self-hosted LLM leaderboard and LM Council benchmarks rank open and proprietary systems.
- OpenRouter’s State of AI provides usage-scale perspective.
- GPT-OSS evaluation, Mistral 3, Arcee, OLMo, GLM-4.6-GGUF, ModernVBERT, and Qwen updates represent ecosystem movement.
Research interest
The interesting research problem is measurement: open-model progress is multi-dimensional, and leaderboards often collapse it into a single score. For local/self-hosted work, deployment constraints, context length, tool use, multimodality, license, and quantization support may matter more than benchmark rank.
Open questions:
- What benchmark mix best predicts usefulness for agentic research workflows?
- How do open-weight models compare after controlling for tool scaffolding and inference budget?
- Are usage studies a better signal of practical value than leaderboards?