Methodology · Overview

Prompt Caching Methodology

Reviewed by Byron Malone · Last reviewed 2026-05-23.

Primary sources

Prompt caching documentation is sourced from each provider's official technical documentation. Anthropic: 'Prompt caching' at docs.anthropic.com/en/docs/build-with-claude/prompt-caching. OpenAI: 'Prompt Caching' at platform.openai.com/docs/guides/prompt-caching. These are the authoritative sources for cache mechanics, pricing, and minimum token requirements.

Anthropic cache pricing: write = 125% of standard input rate (you pay to store). Read = 10% of standard input rate. Cache TTL: 5 minutes, refreshed with each access. Minimum cacheable block: 1,024 tokens. OpenAI cache pricing: 50% discount on cached input tokens, applied automatically to any repeated 1,024+ token prefix.

Hit rate modeling

Cache hit rate depends on: (1) calls per unit time vs. cache TTL; (2) consistency of the cached prefix across calls; (3) model version stability (cache is invalidated on model version change). For Anthropic's 5-minute TTL, a 1 call/minute rate yields 100% hit rate on all but the first call per 5-minute window. A 1 call/10 minutes rate yields 0% hit rate — every call is a cache miss (write at 125% cost, no cache read).

We model hit rate as: min(1.0, calls_per_ttl_window - 1) / calls_per_ttl_window. This gives 0% at 1 call/window, 50% at 2 calls/window, approaching 100% asymptotically at high call rates.

Savings formula

Expected cost without caching: tokens × standard_input_rate. Expected cost with caching: (tokens × cache_write_rate × 1/calls_per_window) + (tokens × cache_read_rate × (1 - 1/calls_per_window)). Net savings = standard_cost - caching_cost. Savings become positive when hit_rate > (write_premium / (1 - read_discount)). For Anthropic: breakeven hit rate = 25% / 90% = 27.8%. Above this hit rate, caching saves money.

Limitations

Cache is per-model-version and per-workspace — shared across API keys in the same workspace for Anthropic. Tool definitions count toward the cacheable token total. Streaming responses interact with caching identically to non-streaming. Cache invalidation on API updates can cause unexpected cost spikes.

Update protocol

This category is reviewed quarterly. Immediate updates are triggered by changes to the primary source documents listed in the citations above — rate table revisions, new agency guidance, or regulatory amendments.

Error reports go to info@bedrockatools.com. Corrections are published on our corrections page.