Compare over 100 AI Models
Ranking the performance of top LLMs from OpenAI, Google, DeepSeek & others across intelligence, price, and speed.
HIGHLIGHTS
QUALITY vs PRICE
Higher quality, lower price = better value
Quality: 88.7 | $7.50
Quality: 82 | $0.26
Quality: 92.3 | $26.25
Quality: 89.1 | $1.93
Quality: 88.3 | $6.00
Quality: 81.5 | $1.60
Quality: 86.8 | $30.00
Quality: 85.9 | $5.25
Quality: 78.9 | $0.13
Quality: 84.1 | $0.18
Quality: 88.5 | $0.48
Quality: 91.8 | $0.96
Quality: 88.6 | $2.70
Quality: 86.2 | $0.70
Quality: 73 | $0.10
Quality: 84 | $3.00
Quality: 87.5 | $4.00
| # | Model | Quality ↓ | Speed | Latency | Price | Output $/1M |
|---|---|---|---|---|---|---|
| 1 | ◉ o1OpenAI Reasoning | 92.3 | 45 t/s | 1.20s | $26.25 | $60.00 |
| 2 | D DeepSeek-R1DeepSeek ReasoningOpen | 91.8 | 65 t/s | 0.90s | $0.96 | $2.19 |
| 3 | ◉ o3-miniOpenAI Reasoning | 89.1 | 70 t/s | 0.80s | $1.93 | $4.40 |
| 4 | ◉ GPT-4oOpenAI | 88.7 | 105 t/s | 0.35s | $7.50 | $15.00 |
| 5 | M Llama 3.1 405BMeta Open | 88.6 | 55 t/s | 0.60s | $2.70 | $2.70 |
| 6 | D DeepSeek-V3DeepSeek Open | 88.5 | 110 t/s | 0.40s | $0.48 | $1.10 |
| 7 | A Claude 3.5 SonnetAnthropic | 88.3 | 85 t/s | 0.42s | $6.00 | $15.00 |
| 8 | X Grok 2xAI | 87.5 | 85 t/s | 0.45s | $4.00 | $10.00 |
| 9 | A Claude 3 OpusAnthropic | 86.8 | 40 t/s | 0.85s | $30.00 | $75.00 |
| 10 | M Llama 3.3 70BMeta Open | 86.2 | 80 t/s | 0.45s | $0.70 | $0.70 |
| 11 | G Gemini 1.5 ProGoogle | 85.9 | 90 t/s | 0.55s | $5.25 | $10.50 |
| 12 | G Gemini 2.0 FlashGoogle | 84.1 | 175 t/s | 0.28s | $0.18 | $0.40 |
| 13 | ◆ Mistral Large 2Mistral | 84.0 | 75 t/s | 0.50s | $3.00 | $6.00 |
| 14 | ◉ GPT-4o miniOpenAI | 82.0 | 150 t/s | 0.25s | $0.26 | $0.60 |
| 15 | A Claude 3.5 HaikuAnthropic | 81.5 | 130 t/s | 0.30s | $1.60 | $4.00 |
| 16 | G Gemini 1.5 FlashGoogle | 78.9 | 160 t/s | 0.35s | $0.13 | $0.30 |
| 17 | M Llama 3.1 8BMeta Open | 73.0 | 200 t/s | 0.20s | $0.10 | $0.10 |
KEY DEFINITIONS
Context Window
Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Output Speed
Tokens per second received while the model is generating tokens (i.e. after first chunk has been received from the API for models which support streaming).
Latency (TTFT)
Time to first token received, in seconds, after API request sent. For reasoning models, this will be the first reasoning token.
Price
Price per token, represented as USD per million tokens. Price is a blend of Input & Output token prices (3:1 ratio).
Output Price
Price per token generated by the model (received from the API), represented as USD per million tokens.
Input Price
Price per token included in the request/message sent to the API, represented as USD per million tokens.
FREQUENTLY ASKED QUESTIONS
Based on our quality index scoring, the top-ranked model changes as providers release updates. Currently, reasoning-focused models from OpenAI and DeepSeek score the highest on overall intelligence benchmarks.
Lightweight models like Llama 3.1 8B and Gemini 2.0 Flash typically achieve the highest tokens-per-second rates, often exceeding 150-200 t/s depending on the API provider.
Open-source models served through competitive providers offer the best price-per-token. Models like Gemini 1.5 Flash and DeepSeek-V3 provide excellent quality-to-price ratios.
DeepSeek-R1 currently leads among open-weights models by quality score, followed closely by Llama 3.1 405B and DeepSeek-V3.
Use the provider filter tabs above the leaderboard table to narrow results by provider. You can also sort any column by clicking its header.
Click on any model name in the leaderboard to visit its dedicated page with detailed benchmark results, pricing breakdowns, and speed measurements across different providers.