The real cost of generative AI tokens in 2026: the full provider-by-provider comparison

The real cost of generative AI tokens in 2026: the full provider-by-provider comparison

Most generative AI pricing roundups make the same mistake: they compare one flagship model per provider, quote a single input/output token rate, and treat that as the real bill.

By 2026, that is no longer enough. The actual cost of using LLMs is a layered cost shape: input tokens, output tokens, cached tokens, cache writes, cache storage, batch discounts, priority premiums, long-context tier jumps, search-grounding fees, hosted retrieval fees, OCR or document parsing fees, code-execution runtime charges, and sometimes separate browser automation or container pricing. The model rate still matters, but it is no longer the whole story, and in some real applications it is not even the dominant one.

This article compares the market the way buyers actually experience it: provider by provider, model by model, and mode by mode, while also separating what stays portable across vendors from what becomes more valuable only if you keep paying for the same surrounding stack.

Table of contents

Methodology and scope

A few ground rules matter for the rest of the comparison:

  • Verified facts: every numeric row below comes from official provider pricing documentation checked on 2026-03-15, unless a row explicitly says that the price was not cleanly exposed in the public static docs view.
  • Widely accepted expert consensus: total LLM cost is driven by more than plain input/output tokens. Caching, long context, retrieval, search, and runtime state change the economics materially.
  • Informed inference: the portability and lock-in judgments are mine. They are not vendor claims.
  • Speculation: none.

More specifically, this comparison covers current billable hosted SKUs and public priced modes from OpenAI, Anthropic, Google Gemini API, Amazon Nova, xAI, Mistral, Cohere, DeepSeek, and Qwen via Alibaba Cloud Model Studio. I am deliberately not mixing in Azure OpenAI, Vertex partner listings, or Bedrock resales of third-party models, because those are channel decisions layered on top of the underlying model economics.

I also need to be explicit about two boundaries. First, I am not exploding every historical alias, retired snapshot, or open-weight download artifact unless the provider still exposes it as a billable public row. Second, parts of the AWS Nova pricing page and parts of xAI's pricing matrix render dynamically. Where the official public page or official search snippet exposed a clean number, I include it. Where the public page clearly listed the product but did not expose a clean numeric row in the static view I could verify, I say so instead of guessing.

Finally, this is a pricing-architecture article, not a benchmark ranking. A cheaper model is not automatically a better value for your use case, and an expensive model is not automatically overpriced. What follows is a cost comparison, not a capability leaderboard.

What "mode" means in this article

A provider no longer sells just "a model." It often sells several pricing modes around the same model:

  • standard vs batch/flex vs priority
  • reasoning vs non-reasoning
  • short-context vs long-context tier
  • text vs realtime vs audio vs image vs video
  • preview/beta vs general availability
  • self-contained inference vs provider-native tools such as search, file search, collections, code execution, or browser automation

That is why the tables below are wider than the usual blog post table. If you compress all of that into one number, you are not doing a comparison. You are doing marketing.

Quick answers by purpose

If you only need the conclusions, start here.

Purpose Best raw price floor Best managed stack Best portability Main catch
Cheapest commodity text generation No single winner. Qwen3.5-Flash Global starts at $0.029 in / $0.287 out, Cohere Command R7B is $0.0375 / $0.15, and Amazon Nova Micro is $0.035 / $0.14. Google Gemini 2.5 Flash-Lite or OpenAI GPT-5 mini if you want a large surrounding platform, not just a cheap model. Mistral Small 3.2 or DeepSeek if you keep storage and retrieval outside the model vendor. Output quality, reasoning depth, context tiers, and tool fees can erase the headline savings.
Premium reasoning On list price, Qwen3-Max Global and Gemini 2.5 Pro look cheaper than GPT-5.4 Pro and Opus 4.6. Grok 4.20 also lands below Anthropic Opus 4.6 on price. OpenAI GPT-5.4 plus native tools, or Gemini 2.5 Pro plus Search/Maps/File Search. Anthropic Sonnet 4.6 with your own retrieval, or DeepSeek if API compatibility matters more than platform depth. This article compares cost architecture, not benchmark quality. Cheap frontier pricing is not a performance verdict.
Long-context work Anthropic Opus 4.6 and Sonnet 4.6 keep 1M context at standard rates. Gemini 2.5 Pro if you also want grounding and multimodal tools. Anthropic or Mistral with self-managed document storage. OpenAI jumps after 272K input for the full session; Gemini 2.5 Pro jumps after 200K input.
Search-grounded answers xAI Web Search is $5 per 1,000 calls. OpenAI and Anthropic are $10. Gemini 3 search is $14, and Gemini 2.5 grounding is $35. Google if you want Search plus Maps and file tools in one stack. Bring your own search layer, or treat search results as portable artifacts and not as provider session state. Search fees are extra. They sit on top of the model tokens.
OCR and document extraction Mistral OCR 3 at $2 per 1,000 pages, or $3 per 1,000 annotated pages. OpenAI, Google, xAI, and AWS all offer document workflows, but they are usually retrieval stacks rather than plain OCR products. Mistral OCR or any transcript/OCR pipeline that returns plain text, markdown, tables, or JSON. File Search, Collections, and Knowledge Bases are not the same thing as OCR. They usually create provider-managed retrieval state.
Realtime voice xAI Realtime is $0.05 per minute for the realtime API, while OpenAI publishes separate text and audio token rates for realtime models. OpenAI and Google both expose richer native multimodal realtime paths than the public price sheet alone suggests. Transcripts travel. Realtime session state does not. Per-minute and per-token realtime pricing are not directly comparable without usage assumptions.
Lowest switching pain DeepSeek, Mistral small models, and Qwen are all good starting points if you keep the rest of the stack under your control. None. Managed stacks are exactly where lock-in grows. Mistral plus your own storage/vector DB, Anthropic with BYO retrieval, or DeepSeek because it supports OpenAI-style and Anthropic-style API formats. The provider model is usually not the sticky part. The sticky part is retrieval, cache, search, and runtime state.

The fastest way to misprice an LLM stack is to confuse four different things:

  1. generating text
  2. processing files
  3. retrieving from files later
  4. running the rest of the workflow around the model

Those are not the same product, and providers do not price them the same way.

The full 2026 AI token pricing comparison table

All prices below are public USD list prices. Unless otherwise noted, token prices are per 1M tokens. The Category column is there so you can filter the table by product type such as chat, voice, image, video, embeddings, OCR, or agent tooling.

Provider Category Model or mode Public list price Notes
OpenAI Chat GPT-5.4 standard $2.50 in / $0.25 cached / $15 out per 1M tokens Long-context sessions above 272K input are repriced for the full session at $5 / $0.50 / $22.50.
OpenAI Chat GPT-5.4 Batch/Flex $1.25 in / $0.13 cached / $7.50 out Long-context sessions above 272K input reprice to $2.50 / $0.25 / $11.25 for the full session.
OpenAI Chat GPT-5.4 Priority $5 in / $0.50 cached / $30 out Premium latency mode. The captured pricing page did not show a separate long-context priority row.
OpenAI Chat GPT-5.4 Pro standard $30 in / n/a cached / $180 out Long-context sessions above 272K input reprice to $60 / n/a / $270 for the full session.
OpenAI Chat GPT-5.4 Pro Batch/Flex $15 in / n/a cached / $90 out Long-context sessions above 272K input reprice to $30 / n/a / $135 for the full session.
OpenAI Chat GPT-5 mini $0.25 in / $0.025 cached / $2 out The cheapest current OpenAI general text model in the captured public sheet.
OpenAI Voice gpt-realtime-1.5 / gpt-realtime Text $4 / $0.40 / $16; audio $32 / $0.40 / $64; image input $5 / $0.50 One row because the public pricing page lists them together.
OpenAI Voice gpt-realtime-mini Text $0.60 / $0.06 / $2.40; audio $10 / $0.30 / $20; image input $0.80 / $0.08 Cheaper realtime tier.
OpenAI Voice gpt-audio-1.5 Audio $32 in / $64 out per 1M audio tokens Separate from the realtime models.
OpenAI Image GPT Image 1.5 Text $5 / $1.25 / $10; image $8 / $2 / $32 per 1M tokens Also publishes per-image examples: 1024 square low/med/high $0.009 / $0.034 / $0.133.
OpenAI Image GPT Image 1 Text $5 / $1.25 / n/a; image $10 / $2.50 / $40 Per-image 1024 square low/med/high $0.011 / $0.042 / $0.167.
OpenAI Image GPT Image 1 mini Text $2 / $0.20 / n/a; image $2.50 / $0.25 / $8 Per-image 1024 square low/med/high $0.005 / $0.011 / $0.036.
OpenAI Video Sora 2 $0.10 per second Video generation. Not token-priced.
OpenAI Video Sora 2 Pro 720p $0.30/sec; 1024x1792 $0.50/sec; 1080p $0.70/sec Higher-cost video tier.
OpenAI Embeddings text-embedding-3-small $0.02 per 1M tokens Batch is half price.
OpenAI Embeddings text-embedding-3-large $0.13 per 1M tokens Batch is half price.
OpenAI Embeddings text-embedding-ada-002 $0.10 per 1M tokens Legacy embedding row still on the price sheet.
OpenAI Voice gpt-4o-mini-tts $0.015 per minute Speech generation.
OpenAI Voice gpt-4o-transcribe / diarize $0.006 per minute Speech to text.
OpenAI Voice gpt-4o-mini-transcribe $0.003 per minute Cheaper transcription tier.
OpenAI Voice Whisper $0.006 per minute Legacy speech-to-text row still on the price sheet.
OpenAI Voice TTS $15 per 1M characters Legacy text-to-speech.
OpenAI Voice TTS HD $30 per 1M characters Higher-quality legacy text-to-speech.
Anthropic Chat Opus 4.6 $5 in / $6.25 5m cache write / $10 1h cache write / $0.50 cache read / $25 out 1M context stays at standard rates. US-only inference on the Claude API adds a 1.1x multiplier.
Anthropic Chat Opus 4.6 fast mode 6x standard rates; effectively $30 in / $150 out Fast mode is priced at 6x and aims for higher output speed.
Anthropic Chat Opus 4.5 $5 / $6.25 / $10 / $0.50 / $25 Same list price as Opus 4.6 in the current sheet.
Anthropic Chat Opus 4.1 $15 / $18.75 / $30 / $1.50 / $75 Older but still listed.
Anthropic Chat Opus 4 $15 / $18.75 / $30 / $1.50 / $75 Still listed.
Anthropic Chat Sonnet 4.6 $3 / $3.75 / $6 / $0.30 / $15 1M context stays at standard rates.
Anthropic Chat Sonnet 4.5 $3 / $3.75 / $6 / $0.30 / $15 If you use the beta 1M context and go above 200K input, price rises to $6 in / $22.50 out.
Anthropic Chat Sonnet 4 $3 / $3.75 / $6 / $0.30 / $15 If you use the beta 1M context and go above 200K input, price rises to $6 in / $22.50 out.
Anthropic Chat Sonnet 3.7 $3 / $3.75 / $6 / $0.30 / $15 Deprecated, but still shown.
Anthropic Chat Haiku 4.5 $1 / $1.25 / $2 / $0.10 / $5 Cheapest current Claude 4.x tier.
Anthropic Chat Haiku 3.5 $0.80 / $1.00 / $1.60 / $0.08 / $4 Still listed.
Anthropic Chat Opus 3 $15 / $18.75 / $30 / $1.50 / $75 Deprecated legacy row.
Anthropic Chat Haiku 3 $0.25 / $0.30 / $0.50 / $0.03 / $1.25 Deprecated legacy row.
Google Chat Gemini 3.1 Pro Preview <=200K: $2 in / $12 out; >200K: $4 / $18 Cache tokens $0.20 or $0.40; cache storage $4.50 per 1M tokens per hour; batch halves input/output; Search grounding $14 per 1,000 queries after free tier.
Google Chat Gemini 3 Flash Preview Text/image/video $0.50 in; audio $1.00 in; $3 out Cache tokens $0.05 or $0.10; storage $1 per 1M tokens per hour; batch input $0.25 or $0.50 audio and output $1.50.
Google Chat Gemini 3.1 Flash-Lite Preview Text/image/video $0.25 in; audio $0.50 in; $1.50 out Cache tokens $0.025 or $0.05; storage $1 per 1M tokens per hour; batch halves input/output.
Google Image Gemini 3.1 Flash Image Preview Text/image input $0.50; text+thinking output $3; image output $60 per 1M image tokens Published per-image examples: 512px $0.045, 1K $0.067, 2K $0.101, 4K $0.151. Batch halves model rates.
Google Image Gemini 3 Pro Image Preview Text/image input $2.00; text+thinking output $12; image output $120 per 1M image tokens Published per-image examples: about $0.134 at 1K or 2K, $0.24 at 4K. Batch halves model rates.
Google Chat Gemini 2.5 Pro <=200K: $1.25 in / $10 out; >200K: $2.50 / $15 Cache tokens $0.125 or $0.25; storage $4.50/hr; batch halves input/output; Search grounding $35 per 1,000 grounded prompts; Maps $25 per 1,000.
Google Chat Gemini 2.5 Flash Text/image/video $0.30 in; audio $1.00 in; $2.50 out Cache tokens $0.03 or $0.10; storage $1/hr; batch input $0.15 or $0.50 audio and output $1.25.
Google Chat Gemini 2.5 Flash-Lite Text/image/video $0.10 in; audio $0.30 in; $0.40 out Cache tokens $0.01 or $0.03; storage $1/hr; batch input $0.05 or $0.15 audio and output $0.20.
Google Chat Gemini 2.5 Flash-Lite Preview Same list price as Gemini 2.5 Flash-Lite Preview row still separately listed.
Google Voice Gemini 2.5 Flash Native Audio Text input $0.50; audio/video input $3.00; text output $2.00; audio output $12.00 Live API row.
Google Image Gemini 2.5 Flash Image Text/image input $0.30; image output $0.039 per image Batch input $0.15; image output $0.0195 per image.
Google Voice Gemini 2.5 Flash Preview TTS Text input $0.50; audio output $10 Batch $0.25 in / $5 out.
Google Voice Gemini 2.5 Pro Preview TTS Text input $1.00; audio output $20 Batch $0.50 in / $10 out.
Google Agent Gemini Robotics-ER 1.5 Preview Text/image/video $0.30 in; audio $1.00 in; $2.50 out Search grounding $35 per 1,000 grounded prompts after free tier. No batch row published yet.
Google Agent Gemini 2.5 Computer Use Preview <=200K: $1.25 in / $10 out; >200K: $2.50 / $15 Specialized computer-use SKU.
Google Chat Gemini 2.0 Flash Text/image/video $0.10 in; audio $0.70 in; $0.40 out Deprecated, shutdown June 1, 2026. Batch $0.05 or $0.35 audio and $0.20 output.
Google Chat Gemini 2.0 Flash-Lite $0.075 in / $0.30 out Deprecated, shutdown June 1, 2026. Batch $0.0375 / $0.15.
Google Embeddings Gemini Embedding 2 Preview Text $0.20 per 1M tokens; image $0.45; audio $6.50; video $12.00 Multimodal embedding prices are unitized by modality.
Google Embeddings Gemini Embedding 001 $0.15 per 1M text tokens Batch $0.075.
Google Image Imagen 4 Fast $0.02 per image Image generation.
Google Image Imagen 4 Standard $0.04 per image Image generation.
Google Image Imagen 4 Ultra $0.06 per image Image generation.
Google Video Veo 3.1 Standard 720p or 1080p $0.40/sec; 4K $0.60/sec Video generation.
Google Video Veo 3.1 Fast 720p $0.15/sec; 1080p $0.35/sec Cheaper video tier.
Google Video Veo 3 Standard $0.40 per second Older video generation row.
Google Video Veo 3 Fast $0.15 per second Older fast video row.
Google Video Veo 2 $0.35 per second Older video generation row.
Google Chat Gemma 3 / Gemma 3n No paid list price in the captured pricing page; free-tier rows only Important strategically, but not a normal paid API row in the page reviewed.
Amazon Nova Chat Nova Micro $0.035 in / $0.14 out The cheapest clearly exposed Nova text tier in the official public pricing snippets.
Amazon Nova Chat Nova Lite $0.06 in / $0.24 out Low-cost general Nova tier.
Amazon Nova Chat Nova Pro $0.80 in / $3.20 out Base Pro tier.
Amazon Nova Chat Nova Pro latency optimized $1.00 in / $4.00 out Lower latency premium mode.
Amazon Nova Chat Nova 2 Lite Pricing page snippet showed $0.30 input / $2.50 output The public static view did not expose the full labeled row cleanly enough for a more detailed reproduction.
Amazon Nova Chat Nova 2 Pro Preview Pricing page snippet showed a row beginning with repeated $2.1875 values and $17.50 output The static public view did not expose the table labels cleanly enough to quote the full row with confidence.
Amazon Nova Chat Nova 2 Omni Preview Product listed; public static docs view did not expose a clean list price Exists on the Nova model catalog page.
Amazon Nova Voice Nova 2 Sonic Product listed; public static docs view did not expose a clean list price Speech model family.
Amazon Nova Chat Nova Premier Product listed; public static docs view did not expose a clean list price Most capable multimodal understanding model.
Amazon Nova Image Nova Canvas Product listed; public static docs view did not expose a clean list price Image generation.
Amazon Nova Video Nova Reel Product listed; public static docs view did not expose a clean list price Video generation.
Amazon Nova Embeddings Nova Multimodal Embeddings Product listed; public static docs view did not expose a clean list price Embedding model.
Amazon Nova Agent Nova Act $4.75 per agent-hour Browser automation / computer-use workflow pricing.
Amazon Nova Tooling Nova Forge Annual subscription; public price not disclosed on the public page Model-building service, not token-priced.
Amazon Nova Tooling Bedrock Prompt Optimization $0.03 per 1,000 tokens This is a platform-side add-on, not a model.
xAI Chat Grok 4.20 Beta $2.00 in / $0.20 cached / $6.00 out Current frontier Grok pricing on the official model page.
xAI Chat Grok 4.20 Beta Reasoning $2.00 in / $0.20 cached / $6.00 out Reasoning row is separately listed with the same current public price.
xAI Agent Grok 4.20 Multi Agent Beta 0309 $2.00 in / $0.20 cached / $6.00 out Multi-agent beta row uses the same current public list price.
xAI Chat Grok 4 Fast $0.20 in / $0.50 out Official docs page snippet clearly exposed the output price and the public model index advertises the fast tier. Cached input exists but the snippet I could verify did not expose the exact number.
xAI Chat Grok 4.1 Fast Model is publicly listed; the static docs view did not expose a separate numeric price row I could verify Do not assume a separate list price unless the model page itself shows one.
xAI Chat Grok 4 Fast Non-Reasoning Model is publicly listed; the static docs view did not expose a separate numeric price row I could verify Useful because xAI explicitly distinguishes reasoning and non-reasoning modes.
xAI Chat Grok 4.1 Fast Non-Reasoning Model is publicly listed; the static docs view did not expose a separate numeric price row I could verify Public model listing exists.
xAI Chat Grok Code Fast 1 Output $1.50 per 1M tokens; the static snippet I could verify did not expose the input price cleanly Code-specialized fast model.
xAI Voice Realtime API $0.05 per minute, or $3 per hour Realtime billing line.
xAI Voice TTS Beta $4.20 per 1M characters Speech generation.
xAI Image / Video Image generation / video generation models Products are publicly advertised; the static docs view I reviewed did not expose a stable numeric list price row Do not assume text-model prices apply to image/video products.
Mistral Chat Mistral Large 3 $0.50 in / $1.50 out Open-weight model; strong portability story.
Mistral Chat Mistral Medium 3.1 $0.40 in / $2.00 out Mid-tier reasoning and instruction model.
Mistral Chat Mistral Small 3.2 $0.10 in / $0.30 out Cheap general text tier.
Mistral Chat Devstral Small 2 $0.10 in / $0.30 out Code-oriented small model.
Mistral Chat Codestral $0.30 in / $0.90 out Code model.
Mistral Voice Voxtral Small $0.004 per minute; $0.10 in / $0.30 out Audio-input model with token and minute pricing exposed.
Mistral Embeddings Mistral Embed $0.10 per 1M tokens Text embeddings.
Mistral Embeddings Codestral Embed $0.15 per 1M tokens Code embeddings.
Mistral OCR OCR 3 $2 per 1,000 pages Portable document extraction product.
Mistral OCR OCR 3 Annotated $3 per 1,000 pages Annotated OCR output.
Cohere Chat Command A $2.50 in / $10.00 out Flagship enterprise text model.
Cohere Chat Command A Reasoning Model is live; the public docs view I could verify did not expose a stable paid list price Public docs list the model but not a clean price row in the static view I reviewed.
Cohere Multimodal Command A Vision Model is live; the public docs view I could verify did not expose a stable paid list price Multimodal Command variant.
Cohere Chat Command R $0.15 in / $0.60 out Very competitive general text tier.
Cohere Chat Command R7B $0.0375 in / $0.15 out One of the cheapest named enterprise text models in public docs.
Cohere Chat Command R+ 08-2024 $2.50 in / $10.00 out Older refresh row documented in Cohere's changelog. Treat as legacy unless the live model catalog still lists it for your account.
Cohere Rerank Rerank models Query-priced rather than token-priced The public docs explain the unit, but the static docs view I reviewed did not surface a stable public dollar figure on Cohere's own site.
Cohere Embeddings Embed models Token-priced rather than generation-priced The public docs explain the pricing model, but the static docs view I reviewed did not surface a stable public dollar figure on Cohere's own site.
DeepSeek Chat deepseek-chat $0.28 cache-miss in / $0.028 cache-hit in / $0.42 out Current public sheet maps the alias to DeepSeek-V3.2 in non-thinking mode.
DeepSeek Chat deepseek-reasoner $0.28 cache-miss in / $0.028 cache-hit in / $0.42 out Current public sheet maps the alias to DeepSeek-V3.2 in thinking mode.
Qwen / Alibaba Cloud Chat Qwen3-Max Global Starts at $0.359 in / $1.434 out Tiered by context: 0-32K, 32K-128K, 128K-252K. Batch and cache discounts available.
Qwen / Alibaba Cloud Chat Qwen3-Max International Starts at $1.20 in / $6.00 out Higher-priced deployment mode.
Qwen / Alibaba Cloud Chat Qwen3-Max Chinese Mainland Starts at $0.359 in / $1.434 out Minima match the captured mainland row.
Qwen / Alibaba Cloud Chat Qwen3.5-Plus Global Starts at $0.115 in / $0.688 out Deployment mode and context tier both matter.
Qwen / Alibaba Cloud Chat Qwen3.5-Plus International Starts at $0.40 in / $2.40 out 0-256K row shown in the public pricing page.
Qwen / Alibaba Cloud Chat Qwen3.5-Flash Global Starts at $0.029 in / $0.287 out Tiered by context: 0-128K, 128K-256K, 256K-1M. Batch and cache discounts available.
Qwen / Alibaba Cloud Chat Qwen3.5-Flash International Starts at $0.10 in / $0.40 out Higher-priced deployment mode.
Qwen / Alibaba Cloud Chat Qwen-Plus US Starts at $0.40 in / $1.20 out in non-thinking mode The public page also shows a much more expensive thinking mode, but the static capture did not expose the entire row cleanly enough to reproduce every number with confidence.
Qwen / Alibaba Cloud Chat Qwen-Flash US Starts at $0.05 in / $0.40 out Tiered by context: 0-256K and 256K-1M. Batch is half price where supported.
Qwen / Alibaba Cloud Chat Qwen3-Max snapshots Older snapshot rows are more expensive than the current qwen3-max Examples on the public page include qwen3-max-2025-09-23 and qwen3-max-preview.
Qwen / Alibaba Cloud Multimodal Qwen-VL / QVQ / Qwen-Omni / Qwen-Omni-Realtime Commercial multimodal families are publicly listed; the static page excerpts I reviewed did not expose all of their numeric price rows Do not assume the text-model prices apply to multimodal SKUs.

A few patterns jump out immediately.

First, the cheapest raw text generation is not concentrated in one camp. Qwen3.5-Flash Global pushes the input floor very low. Cohere Command R7B and Amazon Nova Micro are also extremely cheap on output. Mistral Small 3.2 and Devstral Small 2 are still low enough to matter, and Gemini 2.5 Flash-Lite remains a serious budget option. DeepSeek stays attractive because its current price sheet is simple, cheap, and aggressively cached.

Second, premium reasoning is now surprisingly heterogeneous. Anthropic Opus 4.6 is expensive. OpenAI GPT-5.4 Pro is much more expensive. Gemini 2.5 Pro and xAI Grok 4.20 look cheaper on list price, and Qwen3-Max looks cheaper still in the captured Global mode rows. That does not settle the capability question, but it absolutely changes the budget conversation.

Third, "the cheapest provider" is often not the provider with the cheapest whole system. Search-grounded applications, document-heavy workflows, and agentic runtimes can shift the ranking completely once you add tool pricing, retrieval fees, or runtime charges.

That shift in total cost is also where portability usually starts to matter more than headline token rates. The next section looks at one of the clearest examples: file handling and retrieval.

Why file processing is often where the lock-in starts

Providers often blur three very different offerings under the phrase "work with your files":

  • file parsing or OCR
  • embeddings plus retrieval
  • provider-managed file search or knowledge-base state

Those have very different portability properties.

If you pay for OCR and receive plain text, markdown, tables, or JSON, you can usually send that output to any other model later. That is portable value.

If you pay for embeddings and keep the vectors in your own vector database, that is partly portable. The data can move, but in practice many teams re-embed when they switch retrieval models so query and document vectors stay in the same space.

If you pay for a provider-managed file search layer, collections product, or knowledge-base service, the useful thing you are buying is no longer just the file. You are buying provider-owned retrieval state. Your PDF may be portable, but the managed index, cache, search path, and tool wiring are not.

That distinction matters more than most "token pricing" articles admit.

OpenAI

OpenAI is no longer just a model vendor. It is a managed runtime that layers retrieval, search, image generation, video generation, containers, speech, embeddings, and file infrastructure around the core inference API. That convenience is part of the appeal, but it also means the real cost of an OpenAI build can drift well beyond the headline model rate.

The biggest OpenAI pricing gotcha in 2026 is long context. On the 1.05M-context GPT-5.4 family, once a session exceeds 272K input tokens, the full session is repriced at 2x input and 1.5x output for standard and batch/flex modes. That is a meaningful pricing cliff, not a minor footnote.

OpenAI's public pricing still includes older GPT-4.1-family and legacy rows. The tables below focus on the current GPT-5.4 stack plus the image, speech, embedding, and tool rows that were cleanly exposed in the public pricing pages reviewed for this piece.

OpenAI text models and purchase modes

SKU Mode Input Cached input Output Notes
GPT-5.4 Standard, under 272K input $2.50 $0.25 $15.00 Base current flagship rate.
GPT-5.4 Standard, session above 272K input $5.00 $0.50 $22.50 The entire session is repriced once you cross 272K input on the 1.05M-context models.
GPT-5.4 Batch/Flex, under 272K input $1.25 $0.13 $7.50 Roughly half of standard pricing.
GPT-5.4 Batch/Flex, session above 272K input $2.50 $0.25 $11.25 Full-session repricing once you cross 272K input.
GPT-5.4 Priority $5.00 $0.50 $30.00 Premium latency mode.
GPT-5.4 Pro Standard, under 272K input $30.00 n/a $180.00 Highest-end OpenAI text tier.
GPT-5.4 Pro Standard, session above 272K input $60.00 n/a $270.00 Full-session repricing beyond 272K input.
GPT-5.4 Pro Batch/Flex, under 272K input $15.00 n/a $90.00 Half-price batch/flex mode.
GPT-5.4 Pro Batch/Flex, session above 272K input $30.00 n/a $135.00 Full-session repricing beyond 272K input.
GPT-5 mini Standard $0.25 $0.025 $2.00 Cheapest clearly exposed current OpenAI text model in the captured pricing page.

OpenAI realtime, audio, and speech

SKU Text input Text cached Text output Audio input Audio cached Audio output Image input Image cached Notes
gpt-realtime-1.5 / gpt-realtime $4.00 $0.40 $16.00 $32.00 $0.40 $64.00 $5.00 $0.50 One combined row in the current public page.
gpt-realtime-mini $0.60 $0.06 $2.40 $10.00 $0.30 $20.00 $0.80 $0.08 Lower-cost realtime tier.
gpt-audio-1.5 n/a n/a n/a $32.00 n/a $64.00 n/a n/a Audio-token pricing only.
SKU Price Notes
gpt-4o-mini-tts $0.015 per minute Speech generation.
gpt-4o-transcribe / diarize $0.006 per minute Speech to text.
gpt-4o-mini-transcribe $0.003 per minute Cheaper speech to text.
Whisper $0.006 per minute Legacy speech-to-text row still on the page.
TTS $15 per 1M characters Legacy text-to-speech.
TTS HD $30 per 1M characters Legacy higher-quality text-to-speech.

OpenAI image, video, and embeddings

SKU Price Notes
GPT Image 1.5 Text $5 / $1.25 / $10; image $8 / $2 / $32 per 1M tokens Per-image examples: 1024 square low/med/high $0.009 / $0.034 / $0.133.
GPT Image 1 Text $5 / $1.25 / n/a; image $10 / $2.50 / $40 Per-image examples: 1024 square low/med/high $0.011 / $0.042 / $0.167.
GPT Image 1 mini Text $2 / $0.20 / n/a; image $2.50 / $0.25 / $8 Per-image examples: 1024 square low/med/high $0.005 / $0.011 / $0.036.
Sora 2 $0.10 per second Video generation.
Sora 2 Pro 720p $0.30/sec; 1024x1792 $0.50/sec; 1080p $0.70/sec Premium video generation.
text-embedding-3-small $0.02 per 1M tokens Batch is half price.
text-embedding-3-large $0.13 per 1M tokens Batch is half price.
text-embedding-ada-002 $0.10 per 1M tokens Legacy embedding row still listed.
DALL-E 3 $0.04 standard square; $0.08 HD square Legacy per-image row still on the price sheet.
DALL-E 2 $0.016 at 256x256; $0.018 at 512x512; $0.02 at 1024x1024 Legacy per-image row still on the price sheet.

If you are comparing dedicated image and video generation APIs rather than broader model stacks, Direct Provider Pricing for Image + Video Generation APIs (USD) breaks out that market separately.

OpenAI tool and platform charges

Service Price Portable output?
Regional/data-residency endpoints for GPT-5.4 +10% on GPT-5.4 and GPT-5.4 Pro Regional deployment choice, not a portable artifact.
Containers Per 20-minute session: 1GB $0.03; 4GB $0.12; 16GB $0.48; 64GB $1.92 Generated files can be exported; the runtime state is OpenAI-only.
File Search storage $0.10 per GB per day, first GB free Your source files and extracted outputs can travel; the hosted vector store cannot.
File Search tool call $2.50 per 1,000 calls Calls use OpenAI-managed retrieval state.
Web Search tool Standard/reasoning preview $10 per 1,000 calls plus search content tokens; non-reasoning preview $25 per 1,000 calls with free search content tokens Returned URLs/snippets are portable; the tool and session state are OpenAI-only.
Moderation Free Yes, as a classification result.

OpenAI portability read

OpenAI outputs travel fine. Text, JSON, code, transcripts, generated media, and your original uploaded files are all reusable elsewhere. The sticky part is everything OpenAI hosts for you: File Search vector stores, Web Search session state, containers, and the higher-level runtime plumbing around the responses stack. If you want low switching pain, keep your source files and any long-lived retrieval assets under your own control.

Anthropic

Anthropic currently has the cleanest cache math in the market. It tells you exactly what cache writes cost, exactly what cache reads cost, and exactly how batch pricing halves the bill. That alone makes Claude much easier to model financially than some competitors.

Anthropic also has the best long-context pricing story among the premium frontier vendors in the official public sheets I reviewed. Opus 4.6 and Sonnet 4.6 keep 1M context at standard rates. That is a meaningful difference from the tier jumps you see elsewhere.

Anthropic models

SKU Input 5m cache write 1h cache write Cache read Output Batch input Batch output Notes
Opus 4.6 $5.00 $6.25 $10.00 $0.50 $25.00 $2.50 $12.50 1M context stays at standard rates.
Opus 4.6 fast mode $30.00 $37.50 $60.00 $3.00 $150.00 n/a n/a Fast mode is 6x standard rates.
Opus 4.5 $5.00 $6.25 $10.00 $0.50 $25.00 $2.50 $12.50 Same price as Opus 4.6.
Opus 4.1 $15.00 $18.75 $30.00 $1.50 $75.00 $7.50 $37.50 Still listed.
Opus 4 $15.00 $18.75 $30.00 $1.50 $75.00 $7.50 $37.50 Still listed.
Sonnet 4.6 $3.00 $3.75 $6.00 $0.30 $15.00 $1.50 $7.50 1M context stays at standard rates.
Sonnet 4.5 $3.00 $3.75 $6.00 $0.30 $15.00 $1.50 $7.50 1M beta: above 200K input, price rises to $6 in / $22.50 out.
Sonnet 4 $3.00 $3.75 $6.00 $0.30 $15.00 $1.50 $7.50 1M beta: above 200K input, price rises to $6 in / $22.50 out.
Sonnet 3.7 $3.00 $3.75 $6.00 $0.30 $15.00 $1.50 $7.50 Deprecated row still listed.
Haiku 4.5 $1.00 $1.25 $2.00 $0.10 $5.00 $0.50 $2.50 Cheapest Claude 4.x tier.
Haiku 3.5 $0.80 $1.00 $1.60 $0.08 $4.00 $0.40 $2.00 Still listed.
Opus 3 $15.00 $18.75 $30.00 $1.50 $75.00 $7.50 $37.50 Deprecated legacy row.
Haiku 3 $0.25 $0.30 $0.50 $0.03 $1.25 $0.125 $0.625 Deprecated legacy row.

Anthropic tools and runtime costs

Service Price Portability
Prompt caching 5-minute writes are 1.25x base input; 1-hour writes are 2x; reads are 0.1x Provider-only cache state.
US-only inference on Claude API 1.1x multiplier on all token categories for Opus 4.6 and newer Regional deployment choice, not a portable artifact.
Regional endpoints on Bedrock/Vertex +10% on current 4.5-family regional endpoints Cloud-channel choice, not a portable artifact.
Web Search $10 per 1,000 searches plus normal token charges Returned sources travel; the tool/session state does not.
Web Fetch No additional line-item charge beyond normal token usage Fetched text/output is portable.
Code execution Free when used with web search or web fetch; otherwise first 1,550 hours per org per month free, then $0.05 per hour per container, 5-minute minimum Generated files travel; container state does not.
Bash tool overhead Adds 245 input tokens for the tool definition The overhead is Claude-specific.
Text editor overhead Adds 700 input tokens for the tool definition The overhead is Claude-specific.
Computer use overhead Adds roughly 466-499 tokens to the system prompt, 735 input tokens per tool definition, plus screenshot vision tokens The session/runtime state is provider-only.

Anthropic portability read

Claude itself is not where the strongest lock-in lives. The sticky pieces are prompt cache, code-execution container state, and computer-use runtime state. If you bring your own retrieval layer and use Claude mostly for reasoning and tool orchestration, Anthropic remains one of the easier premium stacks to keep portable.

Google Gemini API

Google is the clearest example of low model list prices living inside a large ecosystem. Gemini 2.5 Pro and Gemini 2.5 Flash-Lite are very competitive at the model layer. But Google also monetizes cache storage, search grounding, maps grounding, embeddings, multimodal outputs, and file-search economics separately.

The other major detail people miss is that Gemini output pricing explicitly includes thinking tokens. That makes direct "output token" comparisons less straightforward if you are comparing Gemini against providers that account for reasoning differently.

Google core text and reasoning models

SKU Standard input Standard output Cache token price Cache storage Batch Notes
Gemini 3.1 Pro Preview <=200K $2.00; >200K $4.00 <=200K $12.00; >200K $18.00 $0.20 or $0.40 $4.50 per 1M tokens/hour Input/output halved Search grounding $14 per 1,000 queries after free tier. Output pricing includes thinking tokens.
Gemini 3 Flash Preview Text/image/video $0.50; audio $1.00 $3.00 $0.05 text/img/video; $0.10 audio $1.00/hr Input $0.25 or $0.50 audio; output $1.50 Output pricing includes thinking tokens.
Gemini 3.1 Flash-Lite Preview Text/image/video $0.25; audio $0.50 $1.50 $0.025 text/img/video; $0.05 audio $1.00/hr Input $0.125 or $0.25 audio; output $0.75 Output pricing includes thinking tokens.
Gemini 2.5 Pro <=200K $1.25; >200K $2.50 <=200K $10.00; >200K $15.00 $0.125 or $0.25 $4.50/hr Input/output halved Search grounding $35 per 1,000 grounded prompts; Maps $25 per 1,000. Output pricing includes thinking tokens.
Gemini 2.5 Flash Text/image/video $0.30; audio $1.00 $2.50 $0.03 text/img/video; $0.10 audio $1.00/hr Input $0.15 or $0.50 audio; output $1.25 Output pricing includes thinking tokens.
Gemini 2.5 Flash-Lite Text/image/video $0.10; audio $0.30 $0.40 $0.01 text/img/video; $0.03 audio $1.00/hr Input $0.05 or $0.15 audio; output $0.20 Output pricing includes thinking tokens.
Gemini 2.5 Flash-Lite Preview Same as 2.5 Flash-Lite Same as 2.5 Flash-Lite Same as 2.5 Flash-Lite $1.00/hr Same as 2.5 Flash-Lite Preview row still listed separately.
Gemini Robotics-ER 1.5 Preview Text/image/video $0.30; audio $1.00 $2.50 n/a n/a No batch row published Search grounding $35 per 1,000 grounded prompts.
Gemini 2.5 Computer Use Preview <=200K $1.25; >200K $2.50 <=200K $10.00; >200K $15.00 n/a n/a n/a Specialized computer-use row.
Gemini 2.0 Flash Text/image/video $0.10; audio $0.70 $0.40 $0.025 text/img/video; $0.175 audio $1.00/hr Input $0.05 or $0.35 audio; output $0.20 Deprecated. Shutdown June 1, 2026.
Gemini 2.0 Flash-Lite $0.075 $0.30 n/a n/a $0.0375 in / $0.15 out Deprecated. Shutdown June 1, 2026.
Gemma 3 / Gemma 3n Free-tier-only rows in the captured pricing page Free-tier-only rows in the captured pricing page n/a n/a n/a Important open model family, but not shown as a normal paid Developer API row in the pricing page reviewed.

Google image, audio, TTS, and video

SKU Price Notes
Gemini 3.1 Flash Image Preview Text/image input $0.50; text+thinking output $3; image output $60 per 1M image tokens Per-image examples: 512px $0.045, 1K $0.067, 2K $0.101, 4K $0.151. Batch halves model rates.
Gemini 3 Pro Image Preview Text/image input $2.00; text+thinking output $12; image output $120 per 1M image tokens Per-image examples: about $0.134 at 1K or 2K, $0.24 at 4K. Batch halves model rates.
Gemini 2.5 Flash Native Audio Text input $0.50; audio/video input $3.00; text output $2.00; audio output $12.00 Live API native audio row.
Gemini 2.5 Flash Image Text/image input $0.30; image output $0.039 per image Batch input $0.15; image output $0.0195 per image.
Gemini 2.5 Flash Preview TTS Text input $0.50; audio output $10.00 Batch $0.25 in / $5 out.
Gemini 2.5 Pro Preview TTS Text input $1.00; audio output $20.00 Batch $0.50 in / $10 out.
Imagen 4 Fast $0.02 per image Image generation.
Imagen 4 Standard $0.04 per image Image generation.
Imagen 4 Ultra $0.06 per image Image generation.
Veo 3.1 Standard 720p/1080p $0.40 per second; 4K $0.60 per second Video generation.
Veo 3.1 Fast 720p $0.15/sec; 1080p $0.35/sec Cheaper video tier.
Veo 3 Standard $0.40 per second Older video row still listed.
Veo 3 Fast $0.15 per second Older fast video row still listed.
Veo 2 $0.35 per second Older video row still listed.

Google embeddings and tools

Service Price Portability
Gemini Embedding 2 Preview Text $0.20 per 1M tokens; image $0.45; audio $6.50; video $12.00 Embeddings can be exported, but in practice you usually re-embed when you switch retrieval models.
Gemini Embedding 001 $0.15 per 1M text tokens; batch $0.075 Same caveat: vectors travel, but retrieval quality usually wants matching query/document models.
Google Search grounding Gemini 2.5: $35 per 1,000 grounded prompts after free tier; Gemini 3: $14 per 1,000 search queries after free tier Returned URLs/snippets travel; grounding service and session state do not.
Google Maps grounding $25 per 1,000 grounded prompts after free tier Returned data is reusable; the service itself is Google-only.
Code execution No separate runtime charge; billed only as normal model tokens Generated files/results can travel; runtime state does not.
URL context No separate tool fee; billed as normal input tokens Fetched text/output is portable.
File Search Tool line itself is free, but embeddings are billed and retrieved document tokens are billed as model tokens Your files travel; the hosted retrieval layer does not.

Google portability read

Raw Gemini outputs are portable. Your grounded URLs, snippets, generated files, and source assets can all be reused elsewhere. The lock-in grows when you adopt Google's native surrounding layers: cache storage, Search/Maps grounding, File Search retrieval state, and the broader Google-native workflow around those tools.

Amazon Nova and Bedrock-side economics

Amazon Nova can look extremely cheap on the pure model rows, especially Nova Micro and Nova Lite. But Nova does not live alone. In practice you are buying Bedrock economics: model inference plus whatever you choose from Prompt Optimization, Guardrails, Knowledge Bases, Data Automation, browser automation, and other AWS-shaped platform features.

That matters because AWS can be the right answer even when it is not the cheapest answer. If your data, IAM, networking, and operations already live in AWS, the integration value may dominate list-price differences. But from a portability perspective, Bedrock is a gravitational field.

Amazon Nova models

SKU Public list price Confidence Notes
Nova Micro $0.035 in / $0.14 out High Clearly exposed in official public pricing snippets.
Nova Lite $0.06 in / $0.24 out High Clearly exposed in official public pricing snippets.
Nova Pro $0.80 in / $3.20 out High Clearly exposed in official public pricing snippets.
Nova Pro latency optimized $1.00 in / $4.00 out High Clearly exposed in official public pricing snippets.
Nova 2 Lite Pricing page snippet showed $0.30 input / $2.50 output Medium The public static page did not expose the labeled row cleanly enough for more detail.
Nova 2 Pro Preview Pricing page snippet showed a row starting with repeated $2.1875 values and $17.50 output Low I am not reproducing the row as fully verified because the table labels were not visible in the static capture.
Nova 2 Omni Preview Not cleanly exposed in the public static docs view Low Product exists on the model catalog page.
Nova 2 Sonic Not cleanly exposed in the public static docs view Low Speech model family.
Nova Premier Not cleanly exposed in the public static docs view Low Most capable multimodal understanding model.
Nova Canvas Not cleanly exposed in the public static docs view Low Image generation model.
Nova Reel Not cleanly exposed in the public static docs view Low Video generation model.
Nova Multimodal Embeddings Not cleanly exposed in the public static docs view Low Embedding model.

AWS platform-side charges around Nova

Service Price Portability
Nova Act $4.75 per agent-hour Generated browser outputs can be exported; the workflow runtime is AWS-specific.
Nova Forge Annual subscription; public price not disclosed Model-building workflow is provider-specific.
Bedrock Prompt Optimization $0.03 per 1,000 tokens Prompt suggestions can travel; the service itself is AWS-specific.
Knowledge Bases / Guardrails / Data Automation Separate Bedrock charges exist, but the static public excerpts I captured did not expose every current numeric row cleanly These services are AWS-shaped platform features, not portable assets.

Amazon Nova portability read

Text and generated assets can be exported anywhere. The sticky value is the AWS platform wrapping: Knowledge Bases, Guardrails, routing, optimization, and agent/browser runtime state. The more of that you adopt, the less meaningful it is to say you are "only paying for a model".

xAI

xAI is more cost-competitive than many people assume, especially if you separate model cost from ecosystem branding. Grok 4.20 is not a bargain-bin model, but Grok 4 Fast is genuinely inexpensive on list price, and xAI's tool charges are unusually clear.

The key thing to understand is that xAI sells both portable outputs and xAI-native retrieval state. A one-shot answer is just an answer. A Collection is not just a file. It is a provider-managed knowledge base with provider-managed search semantics.

xAI models and modes

SKU Input Cached input Output Notes
Grok 4.20 Beta $2.00 $0.20 $6.00 Official model page price.
Grok 4.20 Beta Reasoning $2.00 $0.20 $6.00 Official reasoning page price.
Grok 4.20 Multi Agent Beta 0309 $2.00 $0.20 $6.00 Official multi-agent beta page price.
Grok 4 Fast $0.20 Not cleanly exposed in the snippet I could verify $0.50 The official model page snippet clearly exposed output pricing and the public model index shows the fast tier.
Grok 4.1 Fast Not cleanly exposed in the static docs view Not cleanly exposed in the static docs view Not cleanly exposed in the static docs view Publicly listed model.
Grok 4 Fast Non-Reasoning Not cleanly exposed in the static docs view Not cleanly exposed in the static docs view Not cleanly exposed in the static docs view Publicly listed model.
Grok 4.1 Fast Non-Reasoning Not cleanly exposed in the static docs view Not cleanly exposed in the static docs view Not cleanly exposed in the static docs view Publicly listed model.
Grok Code Fast 1 Not cleanly exposed in the snippet I could verify Not cleanly exposed in the snippet I could verify $1.50 Official model page snippet exposed output pricing.
Realtime API n/a n/a $0.05 per minute Also shown as $3 per hour.
TTS Beta n/a n/a $4.20 per 1M characters Speech generation.
Image generation / video generation models Not cleanly exposed in the static docs view n/a Not cleanly exposed in the static docs view xAI publicly advertises image generation and video, but the static pricing view I reviewed did not expose stable numeric rows.

xAI tools

Service Price Portability
Batch API 50% discount across input, output, cached input, and reasoning tokens for supported text/language models Batch queue is provider-only; outputs are portable.
Web Search $5 per 1,000 calls Returned links/snippets are portable; search tool state is xAI-only.
X Search $5 per 1,000 calls Returned content is portable; the service is xAI/X-specific.
Code Execution $5 per 1,000 calls Generated files/results travel; runtime state does not.
File Attachments search $10 per 1,000 calls Your files travel; attachment search state is provider-specific.
Collections Search $2.50 per 1,000 calls Collections are xAI-managed knowledge bases and are not cross-provider assets.
view_image / video tools Token-based Results travel; the tool path does not.

xAI portability read

xAI outputs travel well: text, code, JSON, and files are reusable. The sticky pieces are Collections, attachment search state, and the workflow around native tools. xAI is attractive if you want cheap search-grounded or tool-heavy experimentation quickly, but it becomes more provider-specific once Collections become central to the product.

Mistral

Mistral has the strongest portability story in this comparison. That is not only because of price. It is because Mistral repeatedly sells artifacts you can reuse elsewhere: OCR output, embeddings, and open-weight models. Even when you use the hosted API, the strategic posture is different from a provider that wants to own your entire runtime.

Mistral is also one of the few vendors where the open-weight strategy materially changes the buying decision. You can prototype on hosted inference, then self-host or move across clouds later without redesigning the whole stack around a proprietary retrieval product.

Mistral hosted SKUs and utilities

SKU Input Output Notes
Mistral Large 3 $0.50 $1.50 Flagship open-weight model.
Mistral Medium 3.1 $0.40 $2.00 Mid-tier general model.
Mistral Small 3.2 $0.10 $0.30 Cheap general text model.
Devstral Small 2 $0.10 $0.30 Cheap code-oriented model.
Codestral $0.30 $0.90 Code model.
Voxtral Small $0.10 $0.30 Also published as $0.004 per minute for audio.
Mistral Embed $0.10 per 1M tokens n/a Text embeddings.
Codestral Embed $0.15 per 1M tokens n/a Code embeddings.
OCR 3 $2 per 1,000 pages n/a Portable OCR / document extraction.
OCR 3 Annotated $3 per 1,000 pages n/a Annotated OCR output.

Mistral portability read

This is the cleanest escape-hatch story in the market right now. OCR output is reusable anywhere. Open-weight checkpoints can move across clouds or on-prem infrastructure. Embeddings are exportable. If you want the lowest medium-term switching cost, Mistral is one of the strongest answers.

Cohere

Cohere remains very interesting for enterprise text and retrieval-heavy work. Command A is not the cheapest flagship, but Command R and Command R7B are aggressively priced, and Cohere's broader product story still centers on retrieval, control, and private deployment rather than maximal consumer-platform sprawl.

The important caveat is documentation visibility. In the public docs view I could verify today, Cohere clearly exposed the current generation prices for Command A, Command R, and Command R7B, and it clearly explained that Rerank is query-priced and Embed is token-priced. But the static public docs view did not expose a stable current dollar figure for every one of those non-generation SKUs. Rather than guess or mix cloud-channel mirrors into the core row, I am marking those cells honestly.

Cohere current public rows

SKU Input Output Notes
Command A $2.50 $10.00 Current flagship enterprise text model.
Command A Reasoning Not cleanly exposed in the public static docs view Not cleanly exposed in the public static docs view Public model exists; I did not verify a stable paid list price row in the static view.
Command A Vision Not cleanly exposed in the public static docs view Not cleanly exposed in the public static docs view Public model exists; pricing row was not exposed in the snippets I could verify.
Command R $0.15 $0.60 Competitive general text tier.
Command R7B $0.0375 $0.15 Very low cost named enterprise model.
Command R+ 08-2024 $2.50 $10.00 Older refresh row documented in Cohere's changelog. Treat as legacy unless your account still exposes it.
Rerank models Query-priced n/a Cohere's public docs explain the pricing model, but the static public view I reviewed did not surface a stable current dollar figure on Cohere's own site.
Embed models Token-priced n/a Same caveat: pricing model is public, but the static docs view I reviewed did not surface a stable current dollar figure on Cohere's own site.

Cohere portability read

Cohere is relatively portable if you keep your own vector database or use private deployment. Rerank scores and embeddings are not inherently provider-trapped artifacts. The more "sticky" Cohere story, if you choose it, is around managed enterprise deployment and private hosting rather than a giant public-tooling ecosystem.

DeepSeek

DeepSeek has one of the simplest serious price sheets in the market. It currently exposes a very low cache-hit price, a still-low cache-miss input price, and a low output price across both the non-thinking and thinking aliases. The pricing page is refreshingly direct.

The more strategically important point is API compatibility. DeepSeek explicitly supports OpenAI-compatible and Anthropic-compatible formats. That means the migration cost can be lower than the raw token price suggests, because you may not need to rewrite your whole client stack.

DeepSeek current public rows

SKU Cache-miss input Cache-hit input Output Notes
deepseek-chat $0.28 $0.028 $0.42 Current sheet maps this alias to DeepSeek-V3.2 in non-thinking mode.
deepseek-reasoner $0.28 $0.028 $0.42 Current sheet maps this alias to DeepSeek-V3.2 in thinking mode.

DeepSeek portability read

The cache itself is provider-specific, but the overall stack is unusually portable because the API surface mirrors other major ecosystems. That is one of the few cases where provider compatibility is almost as valuable as provider price.

Qwen via Alibaba Cloud Model Studio

Qwen can look astonishingly cheap, but it is easy to read the pricing page incorrectly. Deployment mode, region, context tier, and thinking mode all affect the bill. There is no single "Qwen price". There are several Qwen price surfaces, and the cheapest headline row usually belongs to a specific deployment mode and context bracket.

That complexity does not make Qwen unattractive. In fact it is often the opposite. It means the vendor is exposing very aggressive pricing if you can live with the right deployment mode and you understand the tiering. But you do have to read the table carefully.

Qwen deployment-mode overview

Deployment mode Qwen3-Max Qwen3.5-Plus Qwen3.5-Flash Notes
International Starts at $1.20 / $6.00 Starts at $0.40 / $2.40 Starts at $0.10 / $0.40 Higher-priced global commercial tier.
Global Starts at $0.359 / $1.434 Starts at $0.115 / $0.688 Starts at $0.029 / $0.287 Cheapest captured mode for current flagship rows.
US n/a in captured rows Qwen-Plus starts at $0.40 / $1.20 Qwen-Flash starts at $0.05 / $0.40 US uses qwen-plus and qwen-flash naming in the captured rows.
Chinese Mainland Starts at $0.359 / $1.434 Starts at $0.115 / $0.688 Starts at $0.029 / $0.287 Current mainland minima match the captured rows.

Qwen detailed tiered rows captured from the public pricing page

SKU Tier Input Output Notes
qwen3-max / qwen3-max-2026-01-23 0-32K $0.359 $1.434 Global row.
qwen3-max / qwen3-max-2026-01-23 32K-128K $0.574 $2.294 Global row.
qwen3-max / qwen3-max-2026-01-23 128K-252K $1.004 $4.014 Global row.
qwen3-max-2025-09-23 / qwen3-max-preview 0-32K $0.861 $3.441 Older, more expensive snapshot/previews.
qwen3-max-2025-09-23 / qwen3-max-preview 32K-128K $1.434 $5.735 Older, more expensive snapshot/previews.
qwen3-max-2025-09-23 / qwen3-max-preview 128K-252K $2.151 $8.602 Older, more expensive snapshot/previews.
qwen3.5-flash 0-128K $0.029 $0.287 Global row. Supports batch and context cache discounts.
qwen3.5-flash 128K-256K $0.115 $1.147 Global row.
qwen3.5-flash 256K-1M $0.172 $1.720 Global row.
qwen-flash-us 0-256K $0.05 $0.40 US row. Supports batch at half price.
qwen-flash-us 256K-1M $0.25 $2.00 US row. Supports batch at half price.
qwen3.5-plus 0-256K $0.40 $2.40 International row.
qwen3.5-plus 256K-1M $0.50 $3.00 International row.
qwen-plus non-thinking 0-256K $0.40 $1.20 International row.
qwen-plus non-thinking 256K-1M $1.20 $3.60 International row.
qwen-plus thinking Tiered Much higher than non-thinking mode The public static capture did not expose the full row cleanly enough to reproduce every number with confidence Thinking mode pricing exists and is materially higher.

Commercial multimodal Qwen families listed without a clean numeric row in the captured static view

Family Status Price visibility Portability
Qwen-VL Commercial multimodal family is publicly listed The static excerpts I reviewed did not expose a clean numeric row Model outputs travel; Model Studio deployment economics do not.
QVQ Commercial multimodal family is publicly listed The static excerpts I reviewed did not expose a clean numeric row Same caveat.
Qwen-Omni Commercial multimodal family is publicly listed The static excerpts I reviewed did not expose a clean numeric row Same caveat.
Qwen-Omni-Realtime Commercial multimodal family is publicly listed The static excerpts I reviewed did not expose a clean numeric row Same caveat.

Qwen portability read

The hosted Model Studio economics are Alibaba-specific. The broader Qwen family is not. That combination matters. If you want a very cheap hosted commercial tier right now, Qwen is attractive. If you want an exit path later, the wider open-source/open-weight Qwen ecosystem is one of the main reasons the family matters strategically.

A note on Meta and the Llama API

Meta's Llama ecosystem is strategically important and absolutely belongs in any serious conversation about portability, self-hosting, and open-weight alternatives. But I am not putting a numeric Llama API row into the main price comparison because, in the public docs view I could verify today, I could not confirm a stable public pay-as-you-go token sheet accessible without preview gating or login. That is an honesty choice, not an omission by accident.

What can you actually reuse across providers?

This is the part many teams get wrong. They think they are buying "AI" from a provider, when in practice they are buying a mix of portable artifacts and provider-only state.

Provider-by-provider portability map

Provider Outputs that travel well Provider-bound services
OpenAI Text, JSON, code, transcripts, generated files, original uploaded files File Search vector stores, Web Search session state, Containers, response/thread/runtime state
Anthropic Text, JSON, tool results, fetched web content, generated files Prompt cache, code execution container state, computer-use runtime state
Google Text, grounded links/snippets, generated files, original uploaded files Search grounding session state, Maps grounding, cache storage, File Search retrieval state
Amazon Nova / Bedrock Text, generated assets, source files Knowledge Bases, Guardrails, routing, prompt optimization workflow, browser/agent runtimes
xAI Text, JSON, code, generated files, source attachments Collections, attachment search state, tool/runtime state
Mistral Text, OCR output, embeddings, open-weight checkpoints, source files Very little compared with the rest; the main lock-in comes only if you let Mistral own retrieval instead of you
Cohere Text, rerank scores, embeddings, private deployment artifacts Managed public API usage is portable enough; Model Vault and private deployment reduce lock-in further
DeepSeek Text, JSON, source files, cached prompt-independent outputs Prompt cache is provider-specific, but the API surface is unusually portable because it mirrors other ecosystems
Qwen / Alibaba Cloud Text, JSON, open-source/open-weight family artifacts, source files Model Studio deployment mode economics, cache, and batch plumbing are Alibaba-specific

What moves cleanly, what does not

Artifact or service Cross-provider? Why Examples
Raw outputs: text, JSON, code Yes They are just artifacts once you store them outside the vendor. Any chat/completion output from any provider.
Source files: PDFs, CSVs, images, audio Yes Keep the originals in your own storage. The same PDF can be sent to OpenAI, Anthropic, Google, Mistral, or xAI.
OCR and transcripts Yes Plain text, markdown, tables, and JSON are portable. Mistral OCR output; OpenAI or Google transcripts; Whisper transcripts.
Embeddings and vectors Partly You can export vectors, but in practice you usually re-embed when you switch retrieval models so query and document embeddings stay aligned. Gemini Embedding, Mistral Embed, Cohere Embed, OpenAI text-embedding-3.
Prompt cache No Cache state is defined by the provider's own runtime and token accounting. OpenAI cached input, Anthropic cache write/read, Google cache storage, DeepSeek cache-hit pricing.
Hosted file search / managed vector stores No You can move the source files, not the hosted index as a portable service. OpenAI File Search, Google File Search, xAI Collections, Bedrock Knowledge Bases.
Grounded web or map search results Partly Returned links/snippets are reusable, but the tool path and session state are provider-specific. OpenAI Web Search, Anthropic Web Search, Google Search/Maps grounding, xAI Web Search/X Search.
Code execution and containers No, except for exported outputs The runtime state lives inside the provider. OpenAI Containers, Anthropic code execution, Google code execution, xAI code execution, Nova Act workflows.
Open-weight models and self-hosted checkpoints Yes You can move them across clouds or self-host them. Mistral open-weight models, the wider Qwen family, Meta Llama families.
API client compatibility Partly If a provider mirrors another provider's API shape, migration is easier even if the model itself is different. DeepSeek supports OpenAI-compatible and Anthropic-compatible formats.

That table gives you a simple rule of thumb:

  • source artifacts usually travel
  • final outputs usually travel
  • provider-managed retrieval usually does not
  • prompt caches do not
  • runtimes and containers do not
  • embeddings are only partly portable in practice
  • open-weight checkpoints are the strongest long-term escape hatch

A simple example: search-grounded pricing can invert the ranking

If your workflow calls search on every turn, the search line itself can dominate the bill surprisingly quickly.

Provider Search tool line only What that means
xAI $5 per 1,000 calls Cheapest clearly exposed first-party search tool fee in this review.
OpenAI $10 per 1,000 calls for standard/reasoning preview modes, plus search content tokens The tool line is moderate, but tokens still apply.
Anthropic $10 per 1,000 searches, plus normal tokens Very similar headline tool fee to OpenAI.
Google Gemini 3 $14 per 1,000 search queries after free tier Cheaper than Gemini 2.5 grounding, pricier than xAI and OpenAI/Anthropic.
Google Gemini 2.5 $35 per 1,000 grounded prompts after free tier Grounding is meaningfully more expensive here than on Gemini 3.

That is before you add the model tokens. So a provider that looks cheap on generation alone can become expensive in a search-heavy product, and a provider that looks expensive on generation alone can become reasonable if its tool line is cheaper or more predictable for your workload.

The practical buying guide

Here is the cleanest way to think about the market right now.

If you want the cheapest text possible

Start with the low-cost rows that are genuinely public and current: Qwen3.5-Flash Global, Cohere Command R7B, Amazon Nova Micro, Mistral Small 3.2, Devstral Small 2, Gemini 2.5 Flash-Lite, and DeepSeek. Then test the actual output quality you need. The price floor is crowded now. The interesting part is no longer finding a cheap model. It is finding a cheap model that does not force you into expensive surrounding services later.

If you want premium reasoning without instantly paying premium-platform tax

Gemini 2.5 Pro, Grok 4.20, and Qwen3-Max all look more price-aggressive on list price than the top Anthropic and OpenAI premium rows. That does not tell you which one is best for your tasks. It tells you where to begin if you want to avoid assuming that the most visible premium models are also the most financially sensible.

If you want the easiest long-context economics

Anthropic is the cleanest answer in the public sheets I reviewed because Opus 4.6 and Sonnet 4.6 keep 1M context at standard rates. OpenAI and Google can still be excellent choices, but their price jumps are explicit enough that you should model them before you architect around giant prompts.

If you want the lowest future switching pain

Mistral plus your own storage and retrieval stack is probably the cleanest answer. Anthropic with self-managed retrieval is also good. DeepSeek is especially interesting if you already have OpenAI-flavored or Anthropic-flavored client code and want lower migration friction. The deeper you go into File Search, Collections, Knowledge Bases, cache storage, containers, or provider-native search/grounding, the less meaningful it becomes to say that the model layer is portable.

Common questions about AI token pricing

What actually determines the real cost of an LLM stack in 2026?

In this comparison, the real cost is a cost shape, not just a token rate. Input tokens, output tokens, cached tokens, cache writes, cache storage, batch discounts, long-context repricing, search-grounding fees, hosted retrieval fees, OCR or document parsing charges, and runtime costs can all change the final bill materially.

Is headline token price enough to compare providers?

No. Headline token pricing still matters, but this article argues it is incomplete on its own. Search, retrieval, file processing, runtime, and provider-owned state can all matter enough to change which stack is actually cheaper for a real workload.

Which providers have the lowest future switching pain?

The shortlist in this article is Mistral plus your own storage and retrieval stack, Anthropic with self-managed retrieval, and DeepSeek if API compatibility matters to you. In each case, the lower switching pain comes from keeping retrieval and long-lived state under your control.

Are embeddings and managed file search portable across providers?

Not in the same way. Source files, OCR output, transcripts, and final outputs travel well. Embeddings are only partly portable in practice because many teams re-embed when they switch retrieval models. Managed file search, hosted vector stores, caches, and runtime state are provider-specific.

Bottom line

By 2026, "token price" on its own is an incomplete buying metric. It still matters, but it does not tell you enough about the system you are actually paying for.

What you are really buying is a cost shape: model price, cache behavior, long-context behavior, tool pricing, retrieval pricing, file-processing pricing, runtime pricing, and ultimately the amount of provider-owned state your application accumulates.

If you care most about the lowest raw list price, the market now gives you several clear places to start testing. If you care most about premium managed stacks, OpenAI, Google, AWS, and xAI offer the deepest native ecosystems. If you care most about reducing future switching pain, Mistral, Anthropic with BYO retrieval, and DeepSeek remain the most strategically interesting combinations.

The most honest way to compare providers in 2026 is not to ask which model is cheapest. It is to ask which full cost shape you are actually willing to buy.

Official sources checked

Next article

Anthropic Was Right. Here's Why the Pentagon's AI Power Grab Should Terrify Every Engineer in This Industry.

Next

Explore the tools or browse interactive maps for more experiments.

Back to Blog Posts