LLM Benchmark Rankings
Compare and rate the top 10 Large Language Models across coding, agentic ability, reasoning, math and language capabilities.
Coding
Code generation and understanding
Agentic Ability
Autonomous decision making
Reasoning
Logical problem solving
Math
Mathematical computation
Language
Natural language processing
Rate Models
Share your experience with these language models by rating their performance across different dimensions.
Start RatingView Leaderboard
Explore detailed comparisons and rankings of the top language models based on various performance metrics.
View RankingsCurrent Rankings
Model | Coding | Agentic | Reasoning | Math | Language | Average |
---|---|---|---|---|---|---|
GPT-4o | 9 | 8.5 | 9 | 8.5 | 9.5 | 8.9 |
Claude 3.7 Sonnet | 8.5 | 8 | 8.5 | 8 | 9 | 8.4 |
Gemini 2.5 Pro | 8 | 8.5 | 8 | 7.5 | 8.5 | 8.1 |