Token Cost
LLM cost per million tokens explained: input, output, and caching
Updated May 23, 2026 · Byron Malone
LLM APIs charge per million tokens (input and output separately). A $3/M input rate means one million input tokens costs $3. A typical API call (1,500 input + 400 output tokens) with Claude 3.5 Sonnet ($3/M in, $15/M out) costs $0.0105. At 10,000 calls/day, that's $105/day or $3,150/month — before user growth, model upgrades, or architectural inefficiencies.
What is a token and why it's priced the way it is
A token is the unit of text that language models process. Tokenization splits text into subword units using algorithms like Byte Pair Encoding (BPE). In English, a token is approximately 4 characters or 0.75 words. Common benchmarks:
- 'Hello, world!' = 4 tokens - A 100-word paragraph = ~130 tokens - A 500-word system prompt = ~650 tokens - A 2,000-word document = ~2,600 tokens - A 10,000-token context window ≈ 7,500 words ≈ 30 pages
Tokenization varies slightly between models. GPT and Claude models tokenize similarly for English text; both tokenize non-English text at higher rates (more tokens per word for languages like Chinese, Japanese, Arabic).
Why this pricing model? Token count is the primary cost driver for LLM inference at the infrastructure level. Input tokens require computation (attention over all input tokens in parallel). Output tokens require sequential generation (one token at a time, each requiring a forward pass). Output tokens are 3-5x more expensive than input tokens for this reason — the pricing reflects the compute asymmetry.
The real cost of a typical LLM call in production
Let's work through a concrete example for a customer service chatbot:
System prompt (product knowledge, tone guidelines, safety rules): 3,000 tokens Conversation history (3 turns): 500 tokens User message: 150 tokens Total input: 3,650 tokens Generated response: 400 tokens
Cost using Claude 3.5 Sonnet ($3/M input, $15/M output): Input: 3,650/1,000,000 × $3 = $0.01095 Output: 400/1,000,000 × $15 = $0.006 Total: $0.01695/call
At 500 concurrent users, averaging 5 messages per session, 200 sessions/day: 500 calls/day × $0.01695 = $8.48/day = $254/month for 200 daily active users Cost per DAU: $254/200 = $1.27/month
Scaling to 10,000 DAU: $12,700/month in LLM costs alone. At $29/month subscription, gross margin needs to absorb $1.27 per user in LLM costs — 4.4% of subscription revenue going to LLM. Sustainable. The picture changes at $5/DAU, which forces a rethink of the architecture or pricing.
Input vs output token cost asymmetry: architectural implications
The 3-5x price premium for output tokens has real architectural implications:
High output-cost use cases (avoid with expensive models if possible): - Long-form content generation (blog posts, reports, code) - Multi-step chain-of-thought reasoning that generates extensive intermediate text - Conversational AI where users expect lengthy responses
Low output-cost use cases (can use expensive models economically): - Classification (output: one label, ~5 tokens) - Entity extraction (output: structured JSON of extracted entities, ~50-200 tokens) - Short-answer Q&A (output: 1-3 sentences, ~50-100 tokens) - Moderation/safety checks (output: yes/no decision)
Architectural strategy: use expensive frontier models (GPT-4o, Claude 3.5 Sonnet) for tasks where quality is critical AND output is short. Use cheaper mid-tier models (GPT-4o mini, Haiku) for high-output, lower-precision tasks. This 'routing' approach can cut total LLM costs by 40-70% without meaningfully degrading user experience.
The LLM Cost Calculator lets you model a mixed portfolio of calls with different models at different price points.
Total cost of ownership: what teams miss in their LLM budget
Direct API costs are only part of the LLM cost picture. Often-missed line items:
1. Evaluation runs: testing prompt changes on 500+ samples before deployment. A $0.015/call eval on 1,000 samples = $15/eval cycle. If you run 10 eval cycles per product sprint, that's $150/sprint — not trivial at scale.
2. Embedding API calls for vector search in RAG: OpenAI text-embedding-3-small at $0.02/M tokens is cheap but accumulates. Embedding 100,000 documents of 500 tokens each = 50M tokens = $1.00 for the initial corpus. Re-embedding on updates adds ongoing cost.
3. Retry and error handling overhead: production LLM applications experience 1-3% error rates (rate limits, timeouts, malformed outputs). Retry logic adds 1-3% to effective token volume.
4. Prompt engineering iteration: development phase where prompt engineers experiment with dozens of variants before finding the optimal prompt. This testing cost often isn't budgeted.
5. Model upgrade migrations: when a new model version is released, you need to re-evaluate your prompts (older prompt patterns may not work identically) — another eval run cycle.
Budget recommendation: add 25-30% to your production API cost estimate to cover these ancillary costs.
By Byron MaloneLast updated
Founder & Editor, Bedrocka Tools
Try the calculator
This article pairs with theLLM Cost Calculator — which operationalizes the concepts above with your specific numbers.