Question 1

How accurate are these cost projections?

Accepted Answer

Numbers are exact for a known workload. The math is: (input tokens × input rate) + (output tokens × output rate), with the cached fraction of input billed at the model's cached-input rate. Pricing is pulled from each provider's published rate card and updated periodically. The unknown is how accurately your workload matches the inputs you supplied — if your real average input is 800 tokens and you typed 500, the projection is 60% under. The Token Counter at /tools/token-counter/ helps with that calibration.

Question 2

What is prompt caching and why does it matter?

Accepted Answer

Anthropic and OpenAI let you cache the static prefix of a request — typically the system prompt and retrieved context — and pay a much lower rate on subsequent calls that reuse the same prefix. Anthropic charges 0.1× the input rate for cached reads (Claude 3.5 Sonnet input is $3/M → cached is $0.30/M). OpenAI charges 0.5× for cached prefixes on GPT-4o-class models. For a chatbot with a 400-token cached system prompt and 100-token user messages, that's an 80% cache hit rate and a real 5-8× cost reduction.

Question 3

Why doesn't Gemini show a cache discount?

Accepted Answer

Google hasn't published a separate cached-input price for Gemini 1.5 models in the rate card the calculator uses. The calculator falls back to the full input rate for Gemini, so increasing cache hit rate has no effect on Gemini's row. If Google adds cached-input pricing later, we'll update the table.

Question 4

Are output tokens really billed higher than input tokens?

Accepted Answer

Yes — typically 4-5× higher. Claude 3.5 Sonnet is $3/M input vs $15/M output (5×); GPT-4o is $2.50/M vs $10/M (4×). The asymmetry is because output requires autoregressive decoding, which is more compute-intensive than the parallel input encoding. The practical consequence: for workloads where the model writes a lot (summarization, generation), output dominates total cost.

AI Cost Calculator

Projected cost per model

FAQ

Related