AI Cost Calculator
Project monthly LLM API costs across Claude, GPT, and Gemini.
Workload · pick a starting shape
Cache hit rate: fraction of input tokens served from a cached prefix (system prompt, retrieved context). Anthropic cached reads cost 0.1× the input rate; OpenAI cached prefixes cost 0.5×; Gemini has no published cache discount.
Projected cost per model
★ cheapest · sorted by monthly costAnthropic reads cached input at 0.1× the input rate; OpenAI at 0.5×; Gemini publishes no cached-input price so the calculator falls back to the full input rate. Numbers assume the cached portion is held warm; cold-cache writes (Anthropic's 1.25× input rate on first call) are billed separately and amortize out at scale.
FAQ
How accurate are these cost projections?
Exact for a known workload — the math is straightforward: (input × input rate) + (output × output rate), with the cached fraction billed at the model's cached-input rate. Pricing comes from each provider's published rate card. The variable is how well your typed numbers match your actual workload. If you don't know your token sizes, the Token Counter helps you calibrate against a real sample.
What is prompt caching and why does it matter?
Anthropic and OpenAI both let you cache the static prefix of a request — typically the system prompt and any retrieved context — and pay a much lower rate on subsequent calls that reuse the same prefix. Anthropic cached reads cost 0.1× the input rate (Sonnet input is $3/M → cached is $0.30/M). OpenAI cached prefixes on GPT-4o-class models cost 0.5×. For a chatbot with a 400-token cached system prompt and 100-token user messages, that's an 80% effective hit rate and a real 5–8× cost reduction.
Why doesn't Gemini show a cache discount?
Google hasn't published a separate cached-input price for Gemini 1.5 in the rate card we use. The calculator falls back to the full input rate for Gemini, so increasing the cache hit rate doesn't change Gemini's row. If Google adds cached-input pricing, we'll update the table.
Are output tokens really billed higher than input?
Yes — typically 4–5× higher. Claude 3.5 Sonnet is $3/M input vs $15/M output (5×); GPT-4o is $2.50/M vs $10/M (4×). Output is more expensive because it requires autoregressive decoding (one token at a time). The practical consequence: for workloads where the model writes a lot (summarization, generation), output dominates total cost — which is exactly what the "Summarization" template shows.
Related
- → LLM Token Counter — calibrate input/output token sizes against real samples
- → Anthropic Console — $5 free credit