Methodology · Overview
Provider Comparison Methodology
Reviewed by Byron Malone · Last reviewed .
Primary sources
Provider pricing is sourced directly from official pricing pages (see Token Pricing Methodology above). Capability benchmarks are sourced from: LMSYS Chatbot Arena ELO scores (lmarena.ai), which are crowd-sourced pairwise comparisons across 100,000+ users; HumanEval coding benchmark (published by OpenAI, 2021); MMLU academic benchmark (published in Hendrycks et al. 2021); and each provider's own model card benchmark disclosures.
We treat LMSYS ELO as the most reliable general capability proxy because it reflects real user preferences across diverse tasks, not researcher-curated benchmarks. ELO scores are updated continuously as new votes are collected.
Quality-adjusted cost
Quality-adjusted cost = price per 1M tokens / (ELO score / reference_ELO). A model with 1,200 ELO at $3/1M input has a quality-adjusted cost of $3 / (1200/1200) = $3. A model with 1,000 ELO at $1/1M input has quality-adjusted cost of $1 / (1000/1200) = $1.20 — cheaper raw, but less efficient per unit of quality. This metric helps compare across capability tiers.
Quality-adjusted cost is a simplification — ELO scores are general, and specific task performance can deviate substantially. Coding tasks favor different models than creative writing. We provide task-type filters to adjust the comparison.
Context window pricing
Providers differ on pricing for long-context inputs. Gemini 1.5 Pro charges $7/1M for inputs >128K tokens vs. $3.50/1M below 128K (a 2x jump at the boundary). We model the context window pricing step function and show the effective rate for user-specified context lengths.
Limitations
ELO scores change as new models enter the arena — a model's ELO can shift significantly over months. Quality-adjusted cost is a portfolio-average metric; your specific use case may be better served by a cheaper model (simple extraction) or require the most capable model (complex reasoning). Enterprise contracts with volume discounts change the economics significantly.
Update protocol
This category is reviewed quarterly. Immediate updates are triggered by changes to the primary source documents listed in the citations above — rate table revisions, new agency guidance, or regulatory amendments.
Error reports go to info@bedrockatools.com. Corrections are published on our corrections page.