The real cost of generative AI tokens in 2026: the full provider-by-provider comparison

Most generative AI pricing roundups make the same mistake: they compare one flagship model per provider, quote a single input/output token rate, and treat that as the real bill.

By 2026, that is no longer enough. The real bill is a layered cost shape: input tokens, output tokens, cached tokens, cache writes, cache storage, batch discounts, long-context tier jumps, search-grounding fees, hosted retrieval fees, OCR or document parsing fees, runtime charges, and sometimes separate browser automation or container pricing. This article compares the market the way buyers actually experience it: provider by provider, model by model, and mode by mode, while also separating portable value from provider-owned state.

Methodology and scope
What "mode" means in this article
Quick answers by purpose
The full 2026 AI token pricing comparison table
Why file processing is often where the lock-in starts
OpenAI
- OpenAI text models and purchase modes
- OpenAI realtime, audio, and speech
- OpenAI image, video, and embeddings
- OpenAI tool and platform charges
- OpenAI portability read
Anthropic
- Anthropic models
- Anthropic tools and runtime costs
- Anthropic portability read
Google Gemini API
- Google core text and reasoning models
- Google image, audio, TTS, and video
- Google embeddings and tools
- Google portability read
Amazon Nova and Bedrock-side economics
- Amazon Nova models
- AWS platform-side charges around Nova
- Amazon Nova portability read
xAI
- xAI models and modes
- xAI tools
- xAI portability read
Mistral
- Mistral hosted SKUs and utilities
- Mistral portability read
Cohere
- Cohere current public rows
- Cohere portability read
DeepSeek
- DeepSeek current public rows
- DeepSeek portability read
Qwen via Alibaba Cloud Model Studio
- Qwen deployment-mode overview
- Qwen detailed tiered rows captured from the public pricing page
- Commercial multimodal Qwen families listed without a clean numeric row in the captured static view
- Qwen portability read
A note on Meta and the Llama API
What can you actually reuse across providers?
- Provider-by-provider portability map
- What moves cleanly, what does not
A simple example: search-grounded pricing can invert the ranking
The practical buying guide
- If you want the cheapest text possible
- If you want premium reasoning without instantly paying premium-platform tax
- If you want the easiest long-context economics
- If you want the lowest future switching pain
Common questions about AI token pricing
- What actually determines the real cost of an LLM stack in 2026?
- Is headline token price enough to compare providers?
- Which providers have the lowest future switching pain?
- Are embeddings and managed file search portable across providers?
Bottom line
Official sources checked

Methodology and scope

A few ground rules matter for the rest of the comparison:

Verified facts: every numeric row below comes from official provider pricing documentation checked on 2026-05-12, unless a row explicitly says that the price was not cleanly exposed in the public static docs view.
Widely accepted expert consensus: total LLM cost is driven by more than plain input/output tokens. Caching, long context, retrieval, search, and runtime state change the economics materially.
Informed inference: the portability and lock-in judgments are mine. They are not vendor claims.
Speculation: none.

More specifically, this comparison covers current billable hosted SKUs and public priced modes from OpenAI, Anthropic, Google Gemini API, Amazon Nova, xAI, Mistral, Cohere, DeepSeek, and Qwen via Alibaba Cloud Model Studio. I am deliberately not mixing in Azure OpenAI, Vertex partner listings, or Bedrock resales of third-party models, because those are channel decisions layered on top of the underlying model economics.

I also need to be explicit about two boundaries. First, I am not exploding every historical alias, retired snapshot, or open-weight download artifact unless the provider still exposes it as a billable public row. Second, parts of the AWS Nova pricing page and parts of xAI's pricing matrix render dynamically. Where the official public page or official search snippet exposed a clean number, I include it. Where it did not, I mark the gap instead of guessing. Later sections keep those confidence notes short for the same reason.

Finally, this is a pricing-architecture article, not a benchmark ranking. A cheaper model is not automatically a better value for your use case, and an expensive model is not automatically overpriced. What follows is a cost comparison, not a capability leaderboard.

What "mode" means in this article

A provider no longer sells just "a model." It often sells several pricing modes around the same model:

standard vs batch/flex vs priority
reasoning vs non-reasoning
short-context vs long-context tier
text vs realtime vs audio vs image vs video
preview/beta vs general availability
self-contained inference vs provider-native tools such as search, file search, collections, code execution, or browser automation

That is why the tables below are wider than the usual blog post table. If you compress all of that into one number, you are not doing a comparison. You are doing marketing.

Quick answers by purpose

If you only need the conclusions, start here.

Purpose	Best raw price floor	Best managed stack	Best portability	Main catch
Cheapest commodity text generation	No single winner. Qwen3.5-Flash Global starts at $0.029 in / $0.287 out, Cohere Command R7B is $0.0375 / $0.15, and Amazon Nova Micro is $0.035 / $0.14.	Google Gemini 2.5 Flash-Lite or OpenAI GPT-5 mini if you want a large surrounding platform, not just a cheap model.	Mistral Small 3.2 or DeepSeek if you keep storage and retrieval outside the model vendor.	Output quality, reasoning depth, context tiers, and tool fees can erase the headline savings.
Premium reasoning	On list price, Qwen3-Max Global and Gemini 2.5 Pro look cheaper than GPT-5.4 Pro and Opus 4.6. Grok 4.20 also lands below Anthropic Opus 4.6 on price.	OpenAI GPT-5.4 plus native tools, or Gemini 2.5 Pro plus Search/Maps/File Search.	Anthropic Sonnet 4.6 with your own retrieval, or DeepSeek if API compatibility matters more than platform depth.	This article compares cost architecture, not benchmark quality. Cheap frontier pricing is not a performance verdict.
Long-context work	Anthropic Opus 4.6 and Sonnet 4.6 keep 1M context at standard rates.	Gemini 2.5 Pro if you also want grounding and multimodal tools.	Anthropic or Mistral with self-managed document storage.	OpenAI jumps after 272K input for the full session; Gemini 2.5 Pro jumps after 200K input.
Search-grounded answers	xAI Web Search is $5 per 1,000 calls. OpenAI and Anthropic are $10. Gemini 3 search is $14, and Gemini 2.5 grounding is $35.	Google if you want Search plus Maps and file tools in one stack.	Bring your own search layer, or treat search results as portable artifacts and not as provider session state.	Search fees are extra. They sit on top of the model tokens.
OCR and document extraction	Mistral OCR 3 at $2 per 1,000 pages, or $3 per 1,000 annotated pages.	OpenAI, Google, xAI, and AWS all offer document workflows, but they are usually retrieval stacks rather than plain OCR products.	Mistral OCR or any transcript/OCR pipeline that returns plain text, markdown, tables, or JSON.	File Search, Collections, and Knowledge Bases are not the same thing as OCR. They usually create provider-managed retrieval state.
Realtime voice	xAI Realtime is $0.05 per minute for the realtime API, while OpenAI publishes separate text and audio token rates for realtime models.	OpenAI and Google both expose richer native multimodal realtime paths than the public price sheet alone suggests.	Transcripts travel. Realtime session state does not.	Per-minute and per-token realtime pricing are not directly comparable without usage assumptions.
Lowest switching pain	DeepSeek, Mistral small models, and Qwen are all good starting points if you keep the rest of the stack under your control.	None. Managed stacks are exactly where lock-in grows.	Mistral plus your own storage/vector DB, Anthropic with BYO retrieval, or DeepSeek because it supports OpenAI-style and Anthropic-style API formats.	The provider model is usually not the sticky part. The sticky part is retrieval, cache, search, and runtime state.

The fastest way to misprice an LLM stack is to confuse four different things:

generating text
processing files
retrieving from files later
running the rest of the workflow around the model

Those are not the same product, and providers do not price them the same way.

The full 2026 AI token pricing comparison table

All prices below are public USD list prices. Unless otherwise noted, token prices are per 1M tokens. The Category column is there so you can filter the table by product type such as chat, voice, image, video, embeddings, OCR, or agent tooling.

Provider	Category	Model or mode	Public list price	Notes
OpenAI	Chat	GPT-5.4 standard	$2.50 in / $0.25 cached / $15 out per 1M tokens	Long-context sessions above 272K input are repriced for the full session at $5 / $0.50 / $22.50.
OpenAI	Chat	GPT-5.4 Batch/Flex	$1.25 in / $0.13 cached / $7.50 out	Long-context sessions above 272K input reprice to $2.50 / $0.25 / $11.25 for the full session.
OpenAI	Chat	GPT-5.4 Priority	$5 in / $0.50 cached / $30 out	Premium latency mode. The captured pricing page did not show a separate long-context priority row.
OpenAI	Chat	GPT-5.4 Pro standard	$30 in / n/a cached / $180 out	Long-context sessions above 272K input reprice to $60 / n/a / $270 for the full session.
OpenAI	Chat	GPT-5.4 Pro Batch/Flex	$15 in / n/a cached / $90 out	Long-context sessions above 272K input reprice to $30 / n/a / $135 for the full session.
OpenAI	Chat	GPT-5 mini	$0.25 in / $0.025 cached / $2 out	The cheapest current OpenAI general text model in the captured public sheet.
OpenAI	Voice	gpt-realtime-1.5 / gpt-realtime	Text $4 / $0.40 / $16; audio $32 / $0.40 / $64; image input $5 / $0.50	One row because the public pricing page lists them together.
OpenAI	Voice	gpt-realtime-mini	Text $0.60 / $0.06 / $2.40; audio $10 / $0.30 / $20; image input $0.80 / $0.08	Cheaper realtime tier.
OpenAI	Voice	gpt-audio-1.5	Audio $32 in / $64 out per 1M audio tokens	Separate from the realtime models.
OpenAI	Image	GPT Image 1.5	Text $5 / $1.25 / $10; image $8 / $2 / $32 per 1M tokens	Also publishes per-image examples: 1024 square low/med/high $0.009 / $0.034 / $0.133.
OpenAI	Image	GPT Image 1	Text $5 / $1.25 / n/a; image $10 / $2.50 / $40	Per-image 1024 square low/med/high $0.011 / $0.042 / $0.167.
OpenAI	Image	GPT Image 1 mini	Text $2 / $0.20 / n/a; image $2.50 / $0.25 / $8	Per-image 1024 square low/med/high $0.005 / $0.011 / $0.036.
OpenAI	Video	Sora 2	$0.10 per second	Video generation. Not token-priced.
OpenAI	Video	Sora 2 Pro	720p $0.30/sec; 1024x1792 $0.50/sec; 1080p $0.70/sec	Higher-cost video tier.
OpenAI	Embeddings	text-embedding-3-small	$0.02 per 1M tokens	Batch is half price.
OpenAI	Embeddings	text-embedding-3-large	$0.13 per 1M tokens	Batch is half price.
OpenAI	Embeddings	text-embedding-ada-002	$0.10 per 1M tokens	Legacy embedding row still on the price sheet.
OpenAI	Voice	gpt-4o-mini-tts	$0.015 per minute	Speech generation.
OpenAI	Voice	gpt-4o-transcribe / diarize	$0.006 per minute	Speech to text.
OpenAI	Voice	gpt-4o-mini-transcribe	$0.003 per minute	Cheaper transcription tier.
OpenAI	Voice	Whisper	$0.006 per minute	Legacy speech-to-text row still on the price sheet.
OpenAI	Voice	TTS	$15 per 1M characters	Legacy text-to-speech.
OpenAI	Voice	TTS HD	$30 per 1M characters	Higher-quality legacy text-to-speech.
Anthropic	Chat	Opus 4.6	$5 in / $6.25 5m cache write / $10 1h cache write / $0.50 cache read / $25 out	1M context stays at standard rates. US-only inference on the Claude API adds a 1.1x multiplier.
Anthropic	Chat	Opus 4.6 fast mode	6x standard rates; effectively $30 in / $150 out	Fast mode is priced at 6x and aims for higher output speed.
Anthropic	Chat	Opus 4.5	$5 / $6.25 / $10 / $0.50 / $25	Same list price as Opus 4.6 in the current sheet.
Anthropic	Chat	Opus 4.1	$15 / $18.75 / $30 / $1.50 / $75	Older but still listed.
Anthropic	Chat	Opus 4	$15 / $18.75 / $30 / $1.50 / $75	Still listed.
Anthropic	Chat	Sonnet 4.6	$3 / $3.75 / $6 / $0.30 / $15	1M context stays at standard rates.
Anthropic	Chat	Sonnet 4.5	$3 / $3.75 / $6 / $0.30 / $15	If you use the beta 1M context and go above 200K input, price rises to $6 in / $22.50 out.
Anthropic	Chat	Sonnet 4	$3 / $3.75 / $6 / $0.30 / $15	If you use the beta 1M context and go above 200K input, price rises to $6 in / $22.50 out.
Anthropic	Chat	Sonnet 3.7	$3 / $3.75 / $6 / $0.30 / $15	Deprecated, but still shown.
Anthropic	Chat	Haiku 4.5	$1 / $1.25 / $2 / $0.10 / $5	Cheapest current Claude 4.x tier.
Anthropic	Chat	Haiku 3.5	$0.80 / $1.00 / $1.60 / $0.08 / $4	Still listed.
Anthropic	Chat	Opus 3	$15 / $18.75 / $30 / $1.50 / $75	Deprecated legacy row.
Anthropic	Chat	Haiku 3	$0.25 / $0.30 / $0.50 / $0.03 / $1.25	Deprecated legacy row.
Google	Chat	Gemini 3.1 Pro Preview	<=200K: $2 in / $12 out; >200K: $4 / $18	Cache tokens $0.20 or $0.40; cache storage $4.50 per 1M tokens per hour; batch halves input/output; Search grounding $14 per 1,000 queries after free tier.
Google	Chat	Gemini 3 Flash Preview	Text/image/video $0.50 in; audio $1.00 in; $3 out	Cache tokens $0.05 or $0.10; storage $1 per 1M tokens per hour; batch input $0.25 or $0.50 audio and output $1.50.
Google	Chat	Gemini 3.1 Flash-Lite Preview	Text/image/video $0.25 in; audio $0.50 in; $1.50 out	Cache tokens $0.025 or $0.05; storage $1 per 1M tokens per hour; batch halves input/output.
Google	Image	Gemini 3.1 Flash Image Preview	Text/image input $0.50; text+thinking output $3; image output $60 per 1M image tokens	Published per-image examples: 512px $0.045, 1K $0.067, 2K $0.101, 4K $0.151. Batch halves model rates.
Google	Image	Gemini 3 Pro Image Preview	Text/image input $2.00; text+thinking output $12; image output $120 per 1M image tokens	Published per-image examples: about $0.134 at 1K or 2K, $0.24 at 4K. Batch halves model rates.
Google	Chat	Gemini 2.5 Pro	<=200K: $1.25 in / $10 out; >200K: $2.50 / $15	Cache tokens $0.125 or $0.25; storage $4.50/hr; batch halves input/output; Search grounding $35 per 1,000 grounded prompts; Maps $25 per 1,000.
Google	Chat	Gemini 2.5 Flash	Text/image/video $0.30 in; audio $1.00 in; $2.50 out	Cache tokens $0.03 or $0.10; storage $1/hr; batch input $0.15 or $0.50 audio and output $1.25.
Google	Chat	Gemini 2.5 Flash-Lite	Text/image/video $0.10 in; audio $0.30 in; $0.40 out	Cache tokens $0.01 or $0.03; storage $1/hr; batch input $0.05 or $0.15 audio and output $0.20.
Google	Chat	Gemini 2.5 Flash-Lite Preview	Same list price as Gemini 2.5 Flash-Lite	Preview row still separately listed.
Google	Voice	Gemini 2.5 Flash Native Audio	Text input $0.50; audio/video input $3.00; text output $2.00; audio output $12.00	Live API row.
Google	Image	Gemini 2.5 Flash Image	Text/image input $0.30; image output $0.039 per image	Batch input $0.15; image output $0.0195 per image.
Google	Voice	Gemini 2.5 Flash Preview TTS	Text input $0.50; audio output $10	Batch $0.25 in / $5 out.
Google	Voice	Gemini 2.5 Pro Preview TTS	Text input $1.00; audio output $20	Batch $0.50 in / $10 out.
Google	Agent	Gemini Robotics-ER 1.5 Preview	Text/image/video $0.30 in; audio $1.00 in; $2.50 out	Search grounding $35 per 1,000 grounded prompts after free tier. No batch row published yet.
Google	Agent	Gemini 2.5 Computer Use Preview	<=200K: $1.25 in / $10 out; >200K: $2.50 / $15	Specialized computer-use SKU.
Google	Chat	Gemini 2.0 Flash	Text/image/video $0.10 in; audio $0.70 in; $0.40 out	Deprecated, shutdown June 1, 2026. Batch $0.05 or $0.35 audio and $0.20 output.
Google	Chat	Gemini 2.0 Flash-Lite	$0.075 in / $0.30 out	Deprecated, shutdown June 1, 2026. Batch $0.0375 / $0.15.
Google	Embeddings	Gemini Embedding 2 Preview	Text $0.20 per 1M tokens; image $0.45; audio $6.50; video $12.00	Multimodal embedding prices are unitized by modality.
Google	Embeddings	Gemini Embedding 001	$0.15 per 1M text tokens	Batch $0.075.
Google	Image	Imagen 4 Fast	$0.02 per image	Image generation.
Google	Image	Imagen 4 Standard	$0.04 per image	Image generation.
Google	Image	Imagen 4 Ultra	$0.06 per image	Image generation.
Google	Video	Veo 3.1 Standard	720p or 1080p $0.40/sec; 4K $0.60/sec	Video generation.
Google	Video	Veo 3.1 Fast	720p $0.15/sec; 1080p $0.35/sec	Cheaper video tier.
Google	Video	Veo 3 Standard	$0.40 per second	Older video generation row.
Google	Video	Veo 3 Fast	$0.15 per second	Older fast video row.
Google	Video	Veo 2	$0.35 per second	Older video generation row.
Google	Chat	Gemma 3 / Gemma 3n	No paid list price in the captured pricing page; free-tier rows only	Important strategically, but not a normal paid API row in the page reviewed.
Amazon Nova	Chat	Nova Micro	$0.035 in / $0.14 out	The cheapest clearly exposed Nova text tier in the official public pricing snippets.
Amazon Nova	Chat	Nova Lite	$0.06 in / $0.24 out	Low-cost general Nova tier.
Amazon Nova	Chat	Nova Pro	$0.80 in / $3.20 out	Base Pro tier.
Amazon Nova	Chat	Nova Pro latency optimized	$1.00 in / $4.00 out	Lower latency premium mode.
Amazon Nova	Chat	Nova 2 Lite	Pricing page snippet showed $0.30 input / $2.50 output	The public static view did not expose the full labeled row cleanly enough for a more detailed reproduction.
Amazon Nova	Chat	Nova 2 Pro Preview	Pricing page snippet showed a row beginning with repeated $2.1875 values and $17.50 output	The static public view did not expose the table labels cleanly enough to quote the full row with confidence.
Amazon Nova	Chat	Nova 2 Omni Preview	Product listed; public static docs view did not expose a clean list price	Exists on the Nova model catalog page.
Amazon Nova	Voice	Nova 2 Sonic	Product listed; public static docs view did not expose a clean list price	Speech model family.
Amazon Nova	Chat	Nova Premier	Product listed; public static docs view did not expose a clean list price	Most capable multimodal understanding model.
Amazon Nova	Image	Nova Canvas	Product listed; public static docs view did not expose a clean list price	Image generation.
Amazon Nova	Video	Nova Reel	Product listed; public static docs view did not expose a clean list price	Video generation.
Amazon Nova	Embeddings	Nova Multimodal Embeddings	Product listed; public static docs view did not expose a clean list price	Embedding model.
Amazon Nova	Agent	Nova Act	$4.75 per agent-hour	Browser automation / computer-use workflow pricing.
Amazon Nova	Tooling	Nova Forge	Annual subscription; public price not disclosed on the public page	Model-building service, not token-priced.
Amazon Nova	Tooling	Bedrock Prompt Optimization	$0.03 per 1,000 tokens	This is a platform-side add-on, not a model.
xAI	Chat	Grok 4.20 Beta	$2.00 in / $0.20 cached / $6.00 out	Current frontier Grok pricing on the official model page.
xAI	Chat	Grok 4.20 Beta Reasoning	$2.00 in / $0.20 cached / $6.00 out	Reasoning row is separately listed with the same current public price.
xAI	Agent	Grok 4.20 Multi Agent Beta 0309	$2.00 in / $0.20 cached / $6.00 out	Multi-agent beta row uses the same current public list price.
xAI	Chat	Grok 4 Fast	$0.20 in / $0.50 out	Official docs page snippet clearly exposed the output price and the public model index advertises the fast tier. Cached input exists but the snippet I could verify did not expose the exact number.
xAI	Chat	Grok 4.1 Fast	Model is publicly listed; the static docs view did not expose a separate numeric price row I could verify	Do not assume a separate list price unless the model page itself shows one.
xAI	Chat	Grok 4 Fast Non-Reasoning	Model is publicly listed; the static docs view did not expose a separate numeric price row I could verify	Useful because xAI explicitly distinguishes reasoning and non-reasoning modes.
xAI	Chat	Grok 4.1 Fast Non-Reasoning	Model is publicly listed; the static docs view did not expose a separate numeric price row I could verify	Public model listing exists.
xAI	Chat	Grok Code Fast 1	Output $1.50 per 1M tokens; the static snippet I could verify did not expose the input price cleanly	Code-specialized fast model.
xAI	Voice	Realtime API	$0.05 per minute, or $3 per hour	Realtime billing line.
xAI	Voice	TTS Beta	$4.20 per 1M characters	Speech generation.
xAI	Image / Video	Image generation / video generation models	Products are publicly advertised; the static docs view I reviewed did not expose a stable numeric list price row	Do not assume text-model prices apply to image/video products.
Mistral	Chat	Mistral Large 3	$0.50 in / $1.50 out	Open-weight model; strong portability story.
Mistral	Chat	Mistral Medium 3.1	$0.40 in / $2.00 out	Mid-tier reasoning and instruction model.
Mistral	Chat	Mistral Small 3.2	$0.10 in / $0.30 out	Cheap general text tier.
Mistral	Chat	Devstral Small 2	$0.10 in / $0.30 out	Code-oriented small model.
Mistral	Chat	Codestral	$0.30 in / $0.90 out	Code model.
Mistral	Voice	Voxtral Small	$0.004 per minute; $0.10 in / $0.30 out	Audio-input model with token and minute pricing exposed.
Mistral	Embeddings	Mistral Embed	$0.10 per 1M tokens	Text embeddings.
Mistral	Embeddings	Codestral Embed	$0.15 per 1M tokens	Code embeddings.
Mistral	OCR	OCR 3	$2 per 1,000 pages	Portable document extraction product.
Mistral	OCR	OCR 3 Annotated	$3 per 1,000 pages	Annotated OCR output.
Cohere	Chat	Command A	$2.50 in / $10.00 out	Flagship enterprise text model.
Cohere	Chat	Command A Reasoning	Model is live; the public docs view I could verify did not expose a stable paid list price	Public docs list the model but not a clean price row in the static view I reviewed.
Cohere	Multimodal	Command A Vision	Model is live; the public docs view I could verify did not expose a stable paid list price	Multimodal Command variant.
Cohere	Chat	Command R	$0.15 in / $0.60 out	Very competitive general text tier.
Cohere	Chat	Command R7B	$0.0375 in / $0.15 out	One of the cheapest named enterprise text models in public docs.
Cohere	Chat	Command R+ 08-2024	$2.50 in / $10.00 out	Older refresh row documented in Cohere's changelog. Treat as legacy unless the live model catalog still lists it for your account.
Cohere	Rerank	Rerank models	Query-priced rather than token-priced	The public docs explain the unit, but the static docs view I reviewed did not surface a stable public dollar figure on Cohere's own site.
Cohere	Embeddings	Embed models	Token-priced rather than generation-priced	The public docs explain the pricing model, but the static docs view I reviewed did not surface a stable public dollar figure on Cohere's own site.
DeepSeek	Chat	deepseek-v4-flash / deepseek-chat / deepseek-reasoner	$0.14 cache-miss in / $0.0028 cache-hit in / $0.28 out	Current public sheet maps both compatibility aliases to DeepSeek-V4-Flash modes.
DeepSeek	Chat	deepseek-v4-pro	$0.435 cache-miss in / $0.003625 cache-hit in / $0.87 out	Current public row is 75% discounted through May 31, 2026 15:59 UTC.
Qwen / Alibaba Cloud	Chat	Qwen3-Max Global	Starts at $0.359 in / $1.434 out	Tiered by context: 0-32K, 32K-128K, 128K-252K. Batch and cache discounts available.
Qwen / Alibaba Cloud	Chat	Qwen3-Max International	Starts at $1.20 in / $6.00 out	Higher-priced deployment mode.
Qwen / Alibaba Cloud	Chat	Qwen3-Max Chinese Mainland	Starts at $0.359 in / $1.434 out	Minima match the captured mainland row.
Qwen / Alibaba Cloud	Chat	Qwen3.5-Plus Global	Starts at $0.115 in / $0.688 out	Deployment mode and context tier both matter.
Qwen / Alibaba Cloud	Chat	Qwen3.5-Plus International	Starts at $0.40 in / $2.40 out	0-256K row shown in the public pricing page.
Qwen / Alibaba Cloud	Chat	Qwen3.5-Flash Global	Starts at $0.029 in / $0.287 out	Tiered by context: 0-128K, 128K-256K, 256K-1M. Batch and cache discounts available.
Qwen / Alibaba Cloud	Chat	Qwen3.5-Flash International	Starts at $0.10 in / $0.40 out	Higher-priced deployment mode.
Qwen / Alibaba Cloud	Chat	Qwen-Plus US	Starts at $0.40 in / $1.20 out in non-thinking mode	The public page also shows a much more expensive thinking mode, but the static capture did not expose the entire row cleanly enough to reproduce every number with confidence.
Qwen / Alibaba Cloud	Chat	Qwen-Flash US	Starts at $0.05 in / $0.40 out	Tiered by context: 0-256K and 256K-1M. Batch is half price where supported.
Qwen / Alibaba Cloud	Chat	Qwen3-Max snapshots	Older snapshot rows are more expensive than the current qwen3-max	Examples on the public page include qwen3-max-2025-09-23 and qwen3-max-preview.
Qwen / Alibaba Cloud	Multimodal	Qwen-VL / QVQ / Qwen-Omni / Qwen-Omni-Realtime	Commercial multimodal families are publicly listed; the static page excerpts I reviewed did not expose all of their numeric price rows	Do not assume the text-model prices apply to multimodal SKUs.

A few patterns jump out immediately.

First, the cheapest raw text generation is not concentrated in one camp. Qwen3.5-Flash Global pushes the input floor very low. Cohere Command R7B and Amazon Nova Micro are also extremely cheap on output. Mistral Small 3.2 and Devstral Small 2 are still low enough to matter, and Gemini 2.5 Flash-Lite remains a serious budget option. DeepSeek stays attractive because its current price sheet is simple, cheap, and aggressively cached.

Second, premium reasoning is now surprisingly heterogeneous. Anthropic Opus 4.6 is expensive. OpenAI GPT-5.4 Pro is much more expensive. Gemini 2.5 Pro and xAI Grok 4.20 look cheaper on list price, and Qwen3-Max looks cheaper still in the captured Global mode rows. That does not settle the capability question, but it absolutely changes the budget conversation.

Third, "the cheapest provider" is often not the provider with the cheapest whole system. Search-grounded applications, document-heavy workflows, and agentic runtimes can shift the ranking once you add tool pricing, retrieval fees, or runtime charges. The next section shows one of the clearest examples: file handling and retrieval.

Why file processing is often where the lock-in starts

Providers often blur three very different offerings under the phrase "work with your files":

file parsing or OCR
embeddings plus retrieval
provider-managed file search or knowledge-base state

Those have very different portability properties.

If you pay for OCR and receive plain text, markdown, tables, or JSON, you can usually send that output to any other model later. That is portable value.

If you pay for embeddings and keep the vectors in your own vector database, that is partly portable. The data can move, but in practice many teams re-embed when they switch retrieval models so query and document vectors stay in the same space.

If you pay for a provider-managed file search layer, collections product, or knowledge-base service, the useful thing you are buying is no longer just the file. You are buying provider-owned retrieval state. Your PDF may be portable, but the managed index, cache, search path, and tool wiring are not.

That distinction matters more than most "token pricing" articles admit.

OpenAI

OpenAI is no longer just a model vendor. It is a managed runtime that layers retrieval, search, image generation, video generation, containers, speech, embeddings, and file infrastructure around the core inference API. That convenience is part of the appeal, but it also means the real cost of an OpenAI build can drift well beyond the headline model rate.

The biggest OpenAI pricing gotcha in 2026 is long context. On the 1.05M-context GPT-5.4 family, once a session exceeds 272K input tokens, the full session is repriced at 2x input and 1.5x output for standard and batch/flex modes. That is a meaningful pricing cliff, not a minor footnote.

OpenAI's public pricing still includes older GPT-4.1-family and legacy rows. The tables below focus on the current GPT-5.4 stack plus the image, speech, embedding, and tool rows that matter most for real bill estimation.

OpenAI text models and purchase modes

SKU	Mode	Input	Cached input	Output	Notes
GPT-5.4	Standard, under 272K input	$2.50	$0.25	$15.00	Base current flagship rate.
GPT-5.4	Standard, session above 272K input	$5.00	$0.50	$22.50	The entire session is repriced once you cross 272K input on the 1.05M-context models.
GPT-5.4	Batch/Flex, under 272K input	$1.25	$0.13	$7.50	Roughly half of standard pricing.
GPT-5.4	Batch/Flex, session above 272K input	$2.50	$0.25	$11.25	Full-session repricing once you cross 272K input.
GPT-5.4	Priority	$5.00	$0.50	$30.00	Premium latency mode.
GPT-5.4 Pro	Standard, under 272K input	$30.00	n/a	$180.00	Highest-end OpenAI text tier.
GPT-5.4 Pro	Standard, session above 272K input	$60.00	n/a	$270.00	Full-session repricing beyond 272K input.
GPT-5.4 Pro	Batch/Flex, under 272K input	$15.00	n/a	$90.00	Half-price batch/flex mode.
GPT-5.4 Pro	Batch/Flex, session above 272K input	$30.00	n/a	$135.00	Full-session repricing beyond 272K input.
GPT-5 mini	Standard	$0.25	$0.025	$2.00	Cheapest clearly exposed current OpenAI text model in the captured pricing page.

OpenAI realtime, audio, and speech

SKU	Text input	Text cached	Text output	Audio input	Audio cached	Audio output	Image input	Image cached	Notes
gpt-realtime-1.5 / gpt-realtime	$4.00	$0.40	$16.00	$32.00	$0.40	$64.00	$5.00	$0.50	One combined row in the current public page.
gpt-realtime-mini	$0.60	$0.06	$2.40	$10.00	$0.30	$20.00	$0.80	$0.08	Lower-cost realtime tier.
gpt-audio-1.5	n/a	n/a	n/a	$32.00	n/a	$64.00	n/a	n/a	Audio-token pricing only.

SKU	Price	Notes
gpt-4o-mini-tts	$0.015 per minute	Speech generation.
gpt-4o-transcribe / diarize	$0.006 per minute	Speech to text.
gpt-4o-mini-transcribe	$0.003 per minute	Cheaper speech to text.
Whisper	$0.006 per minute	Legacy speech-to-text row still on the page.
TTS	$15 per 1M characters	Legacy text-to-speech.
TTS HD	$30 per 1M characters	Legacy higher-quality text-to-speech.

OpenAI image, video, and embeddings

SKU	Price	Notes
GPT Image 1.5	Text $5 / $1.25 / $10; image $8 / $2 / $32 per 1M tokens	Per-image examples: 1024 square low/med/high $0.009 / $0.034 / $0.133.
GPT Image 1	Text $5 / $1.25 / n/a; image $10 / $2.50 / $40	Per-image examples: 1024 square low/med/high $0.011 / $0.042 / $0.167.
GPT Image 1 mini	Text $2 / $0.20 / n/a; image $2.50 / $0.25 / $8	Per-image examples: 1024 square low/med/high $0.005 / $0.011 / $0.036.
Sora 2	$0.10 per second	Video generation.
Sora 2 Pro	720p $0.30/sec; 1024x1792 $0.50/sec; 1080p $0.70/sec	Premium video generation.
text-embedding-3-small	$0.02 per 1M tokens	Batch is half price.
text-embedding-3-large	$0.13 per 1M tokens	Batch is half price.
text-embedding-ada-002	$0.10 per 1M tokens	Legacy embedding row still listed.
DALL-E 3	$0.04 standard square; $0.08 HD square	Legacy per-image row still on the price sheet.
DALL-E 2	$0.016 at 256x256; $0.018 at 512x512; $0.02 at 1024x1024	Legacy per-image row still on the price sheet.

If you are comparing dedicated image and video generation APIs rather than broader model stacks, Direct Provider Pricing for Image + Video Generation APIs (USD) breaks out that market separately.

OpenAI tool and platform charges

Service	Price	Portable output?
Regional/data-residency endpoints for GPT-5.4	+10% on GPT-5.4 and GPT-5.4 Pro	Regional deployment choice, not a portable artifact.
Containers	Per 20-minute session: 1GB $0.03; 4GB $0.12; 16GB $0.48; 64GB $1.92	Generated files can be exported; the runtime state is OpenAI-only.
File Search storage	$0.10 per GB per day, first GB free	Your source files and extracted outputs can travel; the hosted vector store cannot.
File Search tool call	$2.50 per 1,000 calls	Calls use OpenAI-managed retrieval state.
Web Search tool	Standard/reasoning preview $10 per 1,000 calls plus search content tokens; non-reasoning preview $25 per 1,000 calls with free search content tokens	Returned URLs/snippets are portable; the tool and session state are OpenAI-only.
Moderation	Free	Yes, as a classification result.

OpenAI portability read

OpenAI is easy to adopt and easy to deepen. The tradeoff is that switching pain grows fastest when File Search, Web Search, containers, and other hosted runtime layers become central to the product.

Anthropic

Anthropic currently has the cleanest cache math in the market. It tells you exactly what cache writes cost, what cache reads cost, and how batch pricing halves the bill. It also has one of the cleanest long-context stories among premium vendors, because Opus 4.6 and Sonnet 4.6 keep 1M context at standard rates.

Anthropic models

SKU	Input	5m cache write	1h cache write	Cache read	Output	Batch input	Batch output	Notes
Opus 4.6	$5.00	$6.25	$10.00	$0.50	$25.00	$2.50	$12.50	1M context stays at standard rates.
Opus 4.6 fast mode	$30.00	$37.50	$60.00	$3.00	$150.00	n/a	n/a	Fast mode is 6x standard rates.
Opus 4.5	$5.00	$6.25	$10.00	$0.50	$25.00	$2.50	$12.50	Same price as Opus 4.6.
Opus 4.1	$15.00	$18.75	$30.00	$1.50	$75.00	$7.50	$37.50	Still listed.
Opus 4	$15.00	$18.75	$30.00	$1.50	$75.00	$7.50	$37.50	Still listed.
Sonnet 4.6	$3.00	$3.75	$6.00	$0.30	$15.00	$1.50	$7.50	1M context stays at standard rates.
Sonnet 4.5	$3.00	$3.75	$6.00	$0.30	$15.00	$1.50	$7.50	1M beta: above 200K input, price rises to $6 in / $22.50 out.
Sonnet 4	$3.00	$3.75	$6.00	$0.30	$15.00	$1.50	$7.50	1M beta: above 200K input, price rises to $6 in / $22.50 out.
Sonnet 3.7	$3.00	$3.75	$6.00	$0.30	$15.00	$1.50	$7.50	Deprecated row still listed.
Haiku 4.5	$1.00	$1.25	$2.00	$0.10	$5.00	$0.50	$2.50	Cheapest Claude 4.x tier.
Haiku 3.5	$0.80	$1.00	$1.60	$0.08	$4.00	$0.40	$2.00	Still listed.
Opus 3	$15.00	$18.75	$30.00	$1.50	$75.00	$7.50	$37.50	Deprecated legacy row.
Haiku 3	$0.25	$0.30	$0.50	$0.03	$1.25	$0.125	$0.625	Deprecated legacy row.

Anthropic tools and runtime costs

Service	Price	Portability
Prompt caching	5-minute writes are 1.25x base input; 1-hour writes are 2x; reads are 0.1x	Provider-only cache state.
US-only inference on Claude API	1.1x multiplier on all token categories for Opus 4.6 and newer	Regional deployment choice, not a portable artifact.
Regional endpoints on Bedrock/Vertex	+10% on current 4.5-family regional endpoints	Cloud-channel choice, not a portable artifact.
Web Search	$10 per 1,000 searches plus normal token charges	Returned sources travel; the tool/session state does not.
Web Fetch	No additional line-item charge beyond normal token usage	Fetched text/output is portable.
Code execution	Free when used with web search or web fetch; otherwise first 1,550 hours per org per month free, then $0.05 per hour per container, 5-minute minimum	Generated files travel; container state does not.
Bash tool overhead	Adds 245 input tokens for the tool definition	The overhead is Claude-specific.
Text editor overhead	Adds 700 input tokens for the tool definition	The overhead is Claude-specific.
Computer use overhead	Adds roughly 466-499 tokens to the system prompt, 735 input tokens per tool definition, plus screenshot vision tokens	The session/runtime state is provider-only.

Anthropic portability read

Anthropic stays relatively portable if you keep retrieval outside Claude and use the API mainly for reasoning and orchestration. The sticky pieces are cache, container state, and computer-use runtime.

Google Gemini API

Google is the clearest example of low model list prices living inside a large ecosystem. Gemini 2.5 Pro and Gemini 2.5 Flash-Lite are competitive at the model layer, but Google separately monetizes cache storage, grounding, embeddings, multimodal outputs, and file-search economics. Another important detail: Gemini output pricing explicitly includes thinking tokens, which makes direct output-token comparisons less clean.

Google core text and reasoning models

SKU	Standard input	Standard output	Cache token price	Cache storage	Batch	Notes
Gemini 3.1 Pro Preview	<=200K $2.00; >200K $4.00	<=200K $12.00; >200K $18.00	$0.20 or $0.40	$4.50 per 1M tokens/hour	Input/output halved	Search grounding $14 per 1,000 queries after free tier. Output pricing includes thinking tokens.
Gemini 3 Flash Preview	Text/image/video $0.50; audio $1.00	$3.00	$0.05 text/img/video; $0.10 audio	$1.00/hr	Input $0.25 or $0.50 audio; output $1.50	Output pricing includes thinking tokens.
Gemini 3.1 Flash-Lite Preview	Text/image/video $0.25; audio $0.50	$1.50	$0.025 text/img/video; $0.05 audio	$1.00/hr	Input $0.125 or $0.25 audio; output $0.75	Output pricing includes thinking tokens.
Gemini 2.5 Pro	<=200K $1.25; >200K $2.50	<=200K $10.00; >200K $15.00	$0.125 or $0.25	$4.50/hr	Input/output halved	Search grounding $35 per 1,000 grounded prompts; Maps $25 per 1,000. Output pricing includes thinking tokens.
Gemini 2.5 Flash	Text/image/video $0.30; audio $1.00	$2.50	$0.03 text/img/video; $0.10 audio	$1.00/hr	Input $0.15 or $0.50 audio; output $1.25	Output pricing includes thinking tokens.
Gemini 2.5 Flash-Lite	Text/image/video $0.10; audio $0.30	$0.40	$0.01 text/img/video; $0.03 audio	$1.00/hr	Input $0.05 or $0.15 audio; output $0.20	Output pricing includes thinking tokens.
Gemini 2.5 Flash-Lite Preview	Same as 2.5 Flash-Lite	Same as 2.5 Flash-Lite	Same as 2.5 Flash-Lite	$1.00/hr	Same as 2.5 Flash-Lite	Preview row still listed separately.
Gemini Robotics-ER 1.5 Preview	Text/image/video $0.30; audio $1.00	$2.50	n/a	n/a	No batch row published	Search grounding $35 per 1,000 grounded prompts.
Gemini 2.5 Computer Use Preview	<=200K $1.25; >200K $2.50	<=200K $10.00; >200K $15.00	n/a	n/a	n/a	Specialized computer-use row.
Gemini 2.0 Flash	Text/image/video $0.10; audio $0.70	$0.40	$0.025 text/img/video; $0.175 audio	$1.00/hr	Input $0.05 or $0.35 audio; output $0.20	Deprecated. Shutdown June 1, 2026.
Gemini 2.0 Flash-Lite	$0.075	$0.30	n/a	n/a	$0.0375 in / $0.15 out	Deprecated. Shutdown June 1, 2026.
Gemma 3 / Gemma 3n	Free-tier-only rows in the captured pricing page	Free-tier-only rows in the captured pricing page	n/a	n/a	n/a	Important open model family, but not shown as a normal paid Developer API row in the pricing page reviewed.

Google image, audio, TTS, and video

SKU	Price	Notes
Gemini 3.1 Flash Image Preview	Text/image input $0.50; text+thinking output $3; image output $60 per 1M image tokens	Per-image examples: 512px $0.045, 1K $0.067, 2K $0.101, 4K $0.151. Batch halves model rates.
Gemini 3 Pro Image Preview	Text/image input $2.00; text+thinking output $12; image output $120 per 1M image tokens	Per-image examples: about $0.134 at 1K or 2K, $0.24 at 4K. Batch halves model rates.
Gemini 2.5 Flash Native Audio	Text input $0.50; audio/video input $3.00; text output $2.00; audio output $12.00	Live API native audio row.
Gemini 2.5 Flash Image	Text/image input $0.30; image output $0.039 per image	Batch input $0.15; image output $0.0195 per image.
Gemini 2.5 Flash Preview TTS	Text input $0.50; audio output $10.00	Batch $0.25 in / $5 out.
Gemini 2.5 Pro Preview TTS	Text input $1.00; audio output $20.00	Batch $0.50 in / $10 out.
Imagen 4 Fast	$0.02 per image	Image generation.
Imagen 4 Standard	$0.04 per image	Image generation.
Imagen 4 Ultra	$0.06 per image	Image generation.
Veo 3.1 Standard	720p/1080p $0.40 per second; 4K $0.60 per second	Video generation.
Veo 3.1 Fast	720p $0.15/sec; 1080p $0.35/sec	Cheaper video tier.
Veo 3 Standard	$0.40 per second	Older video row still listed.
Veo 3 Fast	$0.15 per second	Older fast video row still listed.
Veo 2	$0.35 per second	Older video row still listed.

Google embeddings and tools

Service	Price	Portability
Gemini Embedding 2 Preview	Text $0.20 per 1M tokens; image $0.45; audio $6.50; video $12.00	Embeddings can be exported, but in practice you usually re-embed when you switch retrieval models.
Gemini Embedding 001	$0.15 per 1M text tokens; batch $0.075	Same caveat: vectors travel, but retrieval quality usually wants matching query/document models.
Google Search grounding	Gemini 2.5: $35 per 1,000 grounded prompts after free tier; Gemini 3: $14 per 1,000 search queries after free tier	Returned URLs/snippets travel; grounding service and session state do not.
Google Maps grounding	$25 per 1,000 grounded prompts after free tier	Returned data is reusable; the service itself is Google-only.
Code execution	No separate runtime charge; billed only as normal model tokens	Generated files/results can travel; runtime state does not.
URL context	No separate tool fee; billed as normal input tokens	Fetched text/output is portable.
File Search	Tool line itself is free, but embeddings are billed and retrieved document tokens are billed as model tokens	Your files travel; the hosted retrieval layer does not.

Google portability read

Gemini itself is price-competitive. The switching cost rises when Search, Maps, cache storage, and File Search become part of the core workflow instead of just optional tools.

Amazon Nova and Bedrock-side economics

Amazon Nova can look extremely cheap on the pure model rows, especially Nova Micro and Nova Lite. In practice, though, you are buying Bedrock economics: model inference plus whatever you choose from Prompt Optimization, Guardrails, Knowledge Bases, Data Automation, browser automation, and other AWS-shaped platform features. If your data, IAM, networking, and operations already live in AWS, that integration value may dominate list-price differences.

Amazon Nova models

SKU	Public list price	Confidence	Notes
Nova Micro	$0.035 in / $0.14 out	High	Clearly exposed in official public pricing snippets.
Nova Lite	$0.06 in / $0.24 out	High	Clearly exposed in official public pricing snippets.
Nova Pro	$0.80 in / $3.20 out	High	Clearly exposed in official public pricing snippets.
Nova Pro latency optimized	$1.00 in / $4.00 out	High	Clearly exposed in official public pricing snippets.
Nova 2 Lite	Pricing page snippet showed $0.30 input / $2.50 output	Medium	The public static page did not expose the labeled row cleanly enough for more detail.
Nova 2 Pro Preview	Pricing page snippet showed a row starting with repeated $2.1875 values and $17.50 output	Low	I am not reproducing the row as fully verified because the table labels were not visible in the static capture.
Nova 2 Omni Preview	Not cleanly exposed in the public static docs view	Low	Product exists on the model catalog page.
Nova 2 Sonic	Not cleanly exposed in the public static docs view	Low	Speech model family.
Nova Premier	Not cleanly exposed in the public static docs view	Low	Most capable multimodal understanding model.
Nova Canvas	Not cleanly exposed in the public static docs view	Low	Image generation model.
Nova Reel	Not cleanly exposed in the public static docs view	Low	Video generation model.
Nova Multimodal Embeddings	Not cleanly exposed in the public static docs view	Low	Embedding model.

AWS platform-side charges around Nova

Service	Price	Portability
Nova Act	$4.75 per agent-hour	Generated browser outputs can be exported; the workflow runtime is AWS-specific.
Nova Forge	Annual subscription; public price not disclosed	Model-building workflow is provider-specific.
Bedrock Prompt Optimization	$0.03 per 1,000 tokens	Prompt suggestions can travel; the service itself is AWS-specific.
Knowledge Bases / Guardrails / Data Automation	Separate Bedrock charges exist, but the static public excerpts I captured did not expose every current numeric row cleanly	These services are AWS-shaped platform features, not portable assets.

Amazon Nova portability read

Nova can be cheap at the model layer, but the real lock-in lives in the Bedrock wrapper: Knowledge Bases, Guardrails, routing, optimization, and agent/browser runtime state.

xAI

xAI is more cost-competitive than many people assume, especially if you separate model cost from ecosystem branding. Grok 4.20 is not a bargain-bin model, but Grok 4 Fast is genuinely inexpensive on list price, and xAI's tool charges are unusually clear. The key distinction is between portable one-shot outputs and xAI-managed retrieval state such as Collections.

xAI models and modes

SKU	Input	Cached input	Output	Notes
Grok 4.20 Beta	$2.00	$0.20	$6.00	Official model page price.
Grok 4.20 Beta Reasoning	$2.00	$0.20	$6.00	Official reasoning page price.
Grok 4.20 Multi Agent Beta 0309	$2.00	$0.20	$6.00	Official multi-agent beta page price.
Grok 4 Fast	$0.20	Not cleanly exposed in the snippet I could verify	$0.50	The official model page snippet clearly exposed output pricing and the public model index shows the fast tier.
Grok 4.1 Fast	Not cleanly exposed in the static docs view	Not cleanly exposed in the static docs view	Not cleanly exposed in the static docs view	Publicly listed model.
Grok 4 Fast Non-Reasoning	Not cleanly exposed in the static docs view	Not cleanly exposed in the static docs view	Not cleanly exposed in the static docs view	Publicly listed model.
Grok 4.1 Fast Non-Reasoning	Not cleanly exposed in the static docs view	Not cleanly exposed in the static docs view	Not cleanly exposed in the static docs view	Publicly listed model.
Grok Code Fast 1	Not cleanly exposed in the snippet I could verify	Not cleanly exposed in the snippet I could verify	$1.50	Official model page snippet exposed output pricing.
Realtime API	n/a	n/a	$0.05 per minute	Also shown as $3 per hour.
TTS Beta	n/a	n/a	$4.20 per 1M characters	Speech generation.
Image generation / video generation models	Not cleanly exposed in the static docs view	n/a	Not cleanly exposed in the static docs view	xAI publicly advertises image generation and video, but the static pricing view I reviewed did not expose stable numeric rows.

xAI tools

Service	Price	Portability
Batch API	50% discount across input, output, cached input, and reasoning tokens for supported text/language models	Batch queue is provider-only; outputs are portable.
Web Search	$5 per 1,000 calls	Returned links/snippets are portable; search tool state is xAI-only.
X Search	$5 per 1,000 calls	Returned content is portable; the service is xAI/X-specific.
Code Execution	$5 per 1,000 calls	Generated files/results travel; runtime state does not.
File Attachments search	$10 per 1,000 calls	Your files travel; attachment search state is provider-specific.
Collections Search	$2.50 per 1,000 calls	Collections are xAI-managed knowledge bases and are not cross-provider assets.
view_image / video tools	Token-based	Results travel; the tool path does not.

xAI portability read

xAI is attractive for cheap search-grounded or tool-heavy experimentation. It becomes much less portable once Collections and native search workflows move from peripheral features to core product state.

Mistral

Mistral has the strongest portability story in this comparison because it repeatedly sells reusable artifacts: OCR output, embeddings, and open-weight models. You can prototype on hosted inference, then self-host or move across clouds later without redesigning the whole stack around a proprietary retrieval product.

Mistral hosted SKUs and utilities

SKU	Input	Output	Notes
Mistral Large 3	$0.50	$1.50	Flagship open-weight model.
Mistral Medium 3.1	$0.40	$2.00	Mid-tier general model.
Mistral Small 3.2	$0.10	$0.30	Cheap general text model.
Devstral Small 2	$0.10	$0.30	Cheap code-oriented model.
Codestral	$0.30	$0.90	Code model.
Voxtral Small	$0.10	$0.30	Also published as $0.004 per minute for audio.
Mistral Embed	$0.10 per 1M tokens	n/a	Text embeddings.
Codestral Embed	$0.15 per 1M tokens	n/a	Code embeddings.
OCR 3	$2 per 1,000 pages	n/a	Portable OCR / document extraction.
OCR 3 Annotated	$3 per 1,000 pages	n/a	Annotated OCR output.

Mistral portability read

Mistral is the cleanest escape-hatch story in this market. If low medium-term switching cost matters most, it is one of the strongest answers.

Cohere

Cohere remains interesting for enterprise text and retrieval-heavy work. Command A is not the cheapest flagship, but Command R and Command R7B are aggressively priced, and Cohere's broader product story still centers on retrieval, control, and private deployment rather than consumer-platform sprawl. The main caveat here is pricing visibility for some non-generation SKUs, so those rows stay conservative.

Cohere current public rows

SKU	Input	Output	Notes
Command A	$2.50	$10.00	Current flagship enterprise text model.
Command A Reasoning	Not cleanly exposed in the public static docs view	Not cleanly exposed in the public static docs view	Public model exists; I did not verify a stable paid list price row in the static view.
Command A Vision	Not cleanly exposed in the public static docs view	Not cleanly exposed in the public static docs view	Public model exists; pricing row was not exposed in the snippets I could verify.
Command R	$0.15	$0.60	Competitive general text tier.
Command R7B	$0.0375	$0.15	Very low cost named enterprise model.
Command R+ 08-2024	$2.50	$10.00	Older refresh row documented in Cohere's changelog. Treat as legacy unless your account still exposes it.
Rerank models	Query-priced	n/a	Cohere's public docs explain the pricing model, but the static public view I reviewed did not surface a stable current dollar figure on Cohere's own site.
Embed models	Token-priced	n/a	Same caveat: pricing model is public, but the static docs view I reviewed did not surface a stable current dollar figure on Cohere's own site.

Cohere portability read

Cohere is relatively portable if you keep your own vector database or use private deployment. Its stickier value is in managed enterprise deployment and private hosting, not a giant public tooling layer.

DeepSeek

DeepSeek has one of the simplest serious price sheets in the market. It currently exposes a very low cache-hit price, a still-low cache-miss input price, and a low output price across both the non-thinking and thinking aliases. The pricing page is refreshingly direct.

The more strategically important point is API compatibility. DeepSeek explicitly supports OpenAI-compatible and Anthropic-compatible formats, which can lower migration cost even beyond the headline token savings.

DeepSeek current public rows

SKU	Cache-miss input	Cache-hit input	Output	Notes
deepseek-v4-flash / deepseek-chat / deepseek-reasoner	$0.14	$0.0028	$0.28	Compatibility aliases map to DeepSeek-V4-Flash non-thinking and thinking modes.
deepseek-v4-pro	$0.435	$0.003625	$0.87	Current public row is 75% discounted through May 31, 2026 15:59 UTC.

DeepSeek portability read

The cache is provider-specific, but the overall stack is unusually portable because the API surface mirrors other major ecosystems.

Qwen via Alibaba Cloud Model Studio

Qwen can look astonishingly cheap, but it is easy to read the pricing page incorrectly. Deployment mode, region, context tier, and thinking mode all affect the bill. There is no single "Qwen price"; there are several Qwen price surfaces, and the cheapest headline row usually belongs to a specific deployment mode and context bracket. That complexity does not make Qwen unattractive, but it does mean you have to read the table carefully.

Qwen deployment-mode overview

Deployment mode	Qwen3-Max	Qwen3.5-Plus	Qwen3.5-Flash	Notes
International	Starts at $1.20 / $6.00	Starts at $0.40 / $2.40	Starts at $0.10 / $0.40	Higher-priced global commercial tier.
Global	Starts at $0.359 / $1.434	Starts at $0.115 / $0.688	Starts at $0.029 / $0.287	Cheapest captured mode for current flagship rows.
US	n/a in captured rows	Qwen-Plus starts at $0.40 / $1.20	Qwen-Flash starts at $0.05 / $0.40	US uses qwen-plus and qwen-flash naming in the captured rows.
Chinese Mainland	Starts at $0.359 / $1.434	Starts at $0.115 / $0.688	Starts at $0.029 / $0.287	Current mainland minima match the captured rows.

Qwen detailed tiered rows captured from the public pricing page

SKU	Tier	Input	Output	Notes
qwen3-max / qwen3-max-2026-01-23	0-32K	$0.359	$1.434	Global row.
qwen3-max / qwen3-max-2026-01-23	32K-128K	$0.574	$2.294	Global row.
qwen3-max / qwen3-max-2026-01-23	128K-252K	$1.004	$4.014	Global row.
qwen3-max-2025-09-23 / qwen3-max-preview	0-32K	$0.861	$3.441	Older, more expensive snapshot/previews.
qwen3-max-2025-09-23 / qwen3-max-preview	32K-128K	$1.434	$5.735	Older, more expensive snapshot/previews.
qwen3-max-2025-09-23 / qwen3-max-preview	128K-252K	$2.151	$8.602	Older, more expensive snapshot/previews.
qwen3.5-flash	0-128K	$0.029	$0.287	Global row. Supports batch and context cache discounts.
qwen3.5-flash	128K-256K	$0.115	$1.147	Global row.
qwen3.5-flash	256K-1M	$0.172	$1.720	Global row.
qwen-flash-us	0-256K	$0.05	$0.40	US row. Supports batch at half price.
qwen-flash-us	256K-1M	$0.25	$2.00	US row. Supports batch at half price.
qwen3.5-plus	0-256K	$0.40	$2.40	International row.
qwen3.5-plus	256K-1M	$0.50	$3.00	International row.
qwen-plus non-thinking	0-256K	$0.40	$1.20	International row.
qwen-plus non-thinking	256K-1M	$1.20	$3.60	International row.
qwen-plus thinking	Tiered	Much higher than non-thinking mode	The public static capture did not expose the full row cleanly enough to reproduce every number with confidence	Thinking mode pricing exists and is materially higher.

Commercial multimodal Qwen families listed without a clean numeric row in the captured static view

Family	Status	Price visibility	Portability
Qwen-VL	Commercial multimodal family is publicly listed	The static excerpts I reviewed did not expose a clean numeric row	Model outputs travel; Model Studio deployment economics do not.
QVQ	Commercial multimodal family is publicly listed	The static excerpts I reviewed did not expose a clean numeric row	Same caveat.
Qwen-Omni	Commercial multimodal family is publicly listed	The static excerpts I reviewed did not expose a clean numeric row	Same caveat.
Qwen-Omni-Realtime	Commercial multimodal family is publicly listed	The static excerpts I reviewed did not expose a clean numeric row	Same caveat.

Qwen portability read

Qwen is attractive if you want a very cheap hosted commercial tier now without giving up the broader strategic value of an open-source/open-weight family later.

A note on Meta and the Llama API

Meta's Llama ecosystem is strategically important and absolutely belongs in any serious conversation about portability, self-hosting, and open-weight alternatives. But I am not putting a numeric Llama API row into the main price comparison because, in the public docs view I could verify today, I could not confirm a stable public pay-as-you-go token sheet accessible without preview gating or login. That is an honesty choice, not an omission by accident.

What can you actually reuse across providers?

This is the part many teams get wrong: in practice, they are buying a mix of portable artifacts and provider-only state.

Provider-by-provider portability map

Provider	Outputs that travel well	Provider-bound services
OpenAI	Text, JSON, code, transcripts, generated files, original uploaded files	File Search vector stores, Web Search session state, Containers, response/thread/runtime state
Anthropic	Text, JSON, tool results, fetched web content, generated files	Prompt cache, code execution container state, computer-use runtime state
Google	Text, grounded links/snippets, generated files, original uploaded files	Search grounding session state, Maps grounding, cache storage, File Search retrieval state
Amazon Nova / Bedrock	Text, generated assets, source files	Knowledge Bases, Guardrails, routing, prompt optimization workflow, browser/agent runtimes
xAI	Text, JSON, code, generated files, source attachments	Collections, attachment search state, tool/runtime state
Mistral	Text, OCR output, embeddings, open-weight checkpoints, source files	Very little compared with the rest; the main lock-in comes only if you let Mistral own retrieval instead of you
Cohere	Text, rerank scores, embeddings, private deployment artifacts	Managed public API usage is portable enough; Model Vault and private deployment reduce lock-in further
DeepSeek	Text, JSON, source files, cached prompt-independent outputs	Prompt cache is provider-specific, but the API surface is unusually portable because it mirrors other ecosystems
Qwen / Alibaba Cloud	Text, JSON, open-source/open-weight family artifacts, source files	Model Studio deployment mode economics, cache, and batch plumbing are Alibaba-specific

What moves cleanly, what does not

Artifact or service	Cross-provider?	Why	Examples
Raw outputs: text, JSON, code	Yes	They are just artifacts once you store them outside the vendor.	Any chat/completion output from any provider.
Source files: PDFs, CSVs, images, audio	Yes	Keep the originals in your own storage.	The same PDF can be sent to OpenAI, Anthropic, Google, Mistral, or xAI.
OCR and transcripts	Yes	Plain text, markdown, tables, and JSON are portable.	Mistral OCR output; OpenAI or Google transcripts; Whisper transcripts.
Embeddings and vectors	Partly	You can export vectors, but in practice you usually re-embed when you switch retrieval models so query and document embeddings stay aligned.	Gemini Embedding, Mistral Embed, Cohere Embed, OpenAI text-embedding-3.
Prompt cache	No	Cache state is defined by the provider's own runtime and token accounting.	OpenAI cached input, Anthropic cache write/read, Google cache storage, DeepSeek cache-hit pricing.
Hosted file search / managed vector stores	No	You can move the source files, not the hosted index as a portable service.	OpenAI File Search, Google File Search, xAI Collections, Bedrock Knowledge Bases.
Grounded web or map search results	Partly	Returned links/snippets are reusable, but the tool path and session state are provider-specific.	OpenAI Web Search, Anthropic Web Search, Google Search/Maps grounding, xAI Web Search/X Search.
Code execution and containers	No, except for exported outputs	The runtime state lives inside the provider.	OpenAI Containers, Anthropic code execution, Google code execution, xAI code execution, Nova Act workflows.
Open-weight models and self-hosted checkpoints	Yes	You can move them across clouds or self-host them.	Mistral open-weight models, the wider Qwen family, Meta Llama families.
API client compatibility	Partly	If a provider mirrors another provider's API shape, migration is easier even if the model itself is different.	DeepSeek supports OpenAI-compatible and Anthropic-compatible formats.

That table gives you a simple rule of thumb:

source artifacts usually travel
final outputs usually travel
provider-managed retrieval usually does not
prompt caches do not
runtimes and containers do not
embeddings are only partly portable in practice
open-weight checkpoints are the strongest long-term escape hatch

A simple example: search-grounded pricing can invert the ranking

If your workflow calls search on every turn, the search line itself can dominate the bill surprisingly quickly.

Provider	Search tool line only	What that means
xAI	$5 per 1,000 calls	Cheapest clearly exposed first-party search tool fee in this review.
OpenAI	$10 per 1,000 calls for standard/reasoning preview modes, plus search content tokens	The tool line is moderate, but tokens still apply.
Anthropic	$10 per 1,000 searches, plus normal tokens	Very similar headline tool fee to OpenAI.
Google Gemini 3	$14 per 1,000 search queries after free tier	Cheaper than Gemini 2.5 grounding, pricier than xAI and OpenAI/Anthropic.
Google Gemini 2.5	$35 per 1,000 grounded prompts after free tier	Grounding is meaningfully more expensive here than on Gemini 3.

That is before you add the model tokens. A provider that looks cheap on generation alone can become expensive in a search-heavy product, and a provider that looks expensive on generation alone can become reasonable if its tool line is cheaper or more predictable for your workload.

The practical buying guide

Here is the cleanest way to think about the market right now.

If you want the cheapest text possible

Start with the low-cost rows that are genuinely public and current: Qwen3.5-Flash Global, Cohere Command R7B, Amazon Nova Micro, Mistral Small 3.2, Devstral Small 2, Gemini 2.5 Flash-Lite, and DeepSeek. Then test the output quality you actually need. The interesting part is no longer finding a cheap model. It is finding a cheap model that does not force you into expensive surrounding services later.

If you want premium reasoning without instantly paying premium-platform tax

Gemini 2.5 Pro, Grok 4.20, and Qwen3-Max all look more price-aggressive on list price than the top Anthropic and OpenAI premium rows. That does not tell you which one is best for your tasks. It tells you where to begin if you want to avoid assuming the most visible premium models are also the most financially sensible.

If you want the easiest long-context economics

Anthropic is the cleanest answer in the public sheets I reviewed because Opus 4.6 and Sonnet 4.6 keep 1M context at standard rates. OpenAI and Google can still be excellent choices, but their price jumps are explicit enough that you should model them before you architect around giant prompts.

If you want the lowest future switching pain

Mistral plus your own storage and retrieval stack is probably the cleanest answer. Anthropic with self-managed retrieval is also good. DeepSeek is especially interesting if you already have OpenAI-flavored or Anthropic-flavored client code and want lower migration friction. The deeper you go into File Search, Collections, Knowledge Bases, cache storage, containers, or provider-native search/grounding, the less meaningful model-layer portability becomes.

Common questions about AI token pricing

What actually determines the real cost of an LLM stack in 2026?

In this comparison, the real cost is a cost shape, not just a token rate. Input tokens, output tokens, cached tokens, cache writes, cache storage, batch discounts, long-context repricing, search-grounding fees, hosted retrieval fees, document processing charges, and runtime costs can all change the final bill materially.

Is headline token price enough to compare providers?

No. Headline token pricing still matters, but on its own it is incomplete. Search, retrieval, file processing, runtime, and provider-owned state can all change which stack is actually cheaper for a real workload.

Which providers have the lowest future switching pain?

The shortlist in this article is Mistral plus your own storage and retrieval stack, Anthropic with self-managed retrieval, and DeepSeek if API compatibility matters to you. In each case, lower switching pain comes from keeping retrieval and long-lived state under your control.

Are embeddings and managed file search portable across providers?

Not in the same way. Source files, OCR output, transcripts, and final outputs travel well. Embeddings are only partly portable in practice because many teams re-embed when they switch retrieval models. Managed file search, hosted vector stores, caches, and runtime state are provider-specific.

Bottom line

By 2026, "token price" on its own is an incomplete buying metric. What matters is the full cost shape: model price, cache behavior, long-context behavior, tool pricing, retrieval pricing, file-processing pricing, runtime pricing, and ultimately how much provider-owned state your application accumulates.

If you care most about the lowest raw list price, the market now gives you several clear places to start testing. If you care most about premium managed stacks, OpenAI, Google, AWS, and xAI offer the deepest native ecosystems. If you care most about reducing future switching pain, Mistral, Anthropic with BYO retrieval, and DeepSeek remain the most strategically interesting combinations.

The most honest way to compare providers in 2026 is not to ask which model is cheapest. It is to ask which full cost shape you are actually willing to buy.

Official sources checked

OpenAI API pricing: https://openai.com/api/pricing/
OpenAI developer pricing docs: https://developers.openai.com/api/docs/pricing/
Anthropic Claude pricing: https://platform.claude.com/docs/en/about-claude/pricing
Google Gemini API pricing: https://ai.google.dev/gemini-api/docs/pricing
Amazon Nova pricing: https://aws.amazon.com/nova/pricing/
Amazon Nova models: https://aws.amazon.com/nova/models/
Amazon Bedrock pricing: https://aws.amazon.com/bedrock/pricing/
xAI models and pricing: https://docs.x.ai/developers/models
xAI model pages and tool docs: https://docs.x.ai/
Mistral model docs: https://docs.mistral.ai/getting-started/models/compare
Mistral model catalog: https://docs.mistral.ai/getting-started/models
Cohere docs: https://docs.cohere.com/docs/models
Cohere pricing explainer: https://docs.cohere.com/docs/how-does-cohere-pricing-work
DeepSeek pricing: https://api-docs.deepseek.com/quick_start/pricing
Alibaba Cloud Model Studio model pricing: https://www.alibabacloud.com/help/en/model-studio/models
Meta Llama API overview: https://llama.developer.meta.com/docs/overview/