Question 1

Why is output more expensive than input?

Accepted Answer

Generating tokens is autoregressive and uses far more compute per token than reading the prompt, which is processed in parallel. Output prices typically run 3-5x the input rate.

Question 2

How is pricing usually quoted?

Accepted Answer

Per million tokens, split into input and output rates, sometimes with extra tiers for cached input, batch jobs, or extended context. Multiply rate by tokens then divide by one million.

Question 3

Do cheaper models always cost less overall?

Accepted Answer

Not necessarily. A weaker model often needs more retries, longer prompts, or larger few-shot examples to hit the same quality, which can erase the per-token savings.

Model	Provider	$/1M in	$/1M out	Estimated cost
GPT-4o mini	OpenAI	$0.15	$0.60	$0.0270
DeepSeek V3	DeepSeek	$0.27	$1.10	$0.0490
GPT-5 mini	OpenAI	$0.25	$1.25	$0.0500
Gemini 2.5 Flash	Google	$0.30	$2.50	$0.0800
Llama 3.3 70B	Meta (Together)	$0.88	$0.88	$0.1056
Claude Haiku 4.5	Anthropic	$1.00	$5.00	$0.2000
Mistral Large 2	Mistral	$2.00	$6.00	$0.3200
Gemini 2.5 Pro	Google	$1.25	$10.00	$0.3250
GPT-4o	OpenAI	$2.50	$10.00	$0.4500
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	$0.6000
GPT-5	OpenAI	$10.00	$30.00	$1.6000
Claude Opus 4.7	Anthropic	$15.00	$75.00	$3.0000

LLM Pricing Comparator

Usage estimate

Cost ranking

Frequently asked questions

LLM Pricing Comparator

Usage estimate

Cost ranking

Related tools

Frequently asked questions