The real cost of generative AI tokens in 2026: the full provider-by-provider comparison

Most generative AI pricing roundups make the same mistake: they compare one flagship model per provider, quote a single input/output token rate, and treat that as the real bill.

By 2026, that is no longer enough. The real bill is a layered cost shape: input tokens, output tokens, cached tokens, cache writes, cache storage, batch discounts, long-context tier jumps, search-grounding fees, hosted retrieval fees, OCR or document parsing fees, runtime charges, and sometimes separate browser automation or container pricing. This article compares the market the way buyers actually experience it: provider by provider, model by model, and mode by mode, while also separating portable value from provider-owned state.

Table of contents

Methodology and scope

A few ground rules matter for the rest of the comparison:

  • Verified facts: every numeric row below comes from official provider pricing documentation checked on 2026-06-17, unless a row explicitly says that the price was not cleanly exposed in the public static docs view.
  • Unit convention: token prices are USD per 1M tokens unless the row explicitly says per minute, per hour, per image, per second, per page, per 1,000 calls, or another non-token unit. Cache-hit, cache-write, and cache-storage prices are labeled separately because they are not interchangeable with normal input tokens.
  • Widely accepted expert consensus: total LLM cost is driven by more than plain input/output tokens. Caching, long context, retrieval, search, and runtime state change the economics materially.
  • Informed inference: the portability and lock-in judgments are mine. They are not vendor claims.
  • Speculation: none.

More specifically, this comparison covers current billable hosted SKUs and public priced modes from OpenAI, Anthropic, Google Gemini API, Amazon Nova, xAI, Mistral, Cohere, DeepSeek, and Qwen via Alibaba Cloud Model Studio. I am deliberately not mixing in Azure OpenAI, Vertex partner listings, or Bedrock resales of third-party models, because those are channel decisions layered on top of the underlying model economics.

I also need to be explicit about two boundaries. First, I am not exploding every historical alias, retired snapshot, or open-weight download artifact unless the provider still exposes it as a billable public row. Second, parts of the AWS Nova pricing page and parts of xAI's pricing matrix render dynamically. Where the official public page or official search snippet exposed a clean number, I include it. Where it did not, I mark the gap instead of guessing. Later sections keep those confidence notes short for the same reason.

Finally, this is a pricing-architecture article, not a benchmark ranking. A cheaper model is not automatically a better value for your use case, and an expensive model is not automatically overpriced. What follows is a cost comparison, not a capability leaderboard.

What "mode" means in this article

A provider no longer sells just "a model." It often sells several pricing modes around the same model:

  • standard vs batch/flex vs priority
  • reasoning vs non-reasoning
  • short-context vs long-context tier
  • text vs realtime vs audio vs image vs video
  • preview/beta vs general availability
  • self-contained inference vs provider-native tools such as search, file search, collections, code execution, or browser automation

That is why the tables below are wider than the usual blog post table. If you compress all of that into one number, you are not doing a comparison. You are doing marketing.

Quick answers by purpose

If you only need the conclusions, start here.

PurposeBest raw price floorBest managed stackBest portabilityMain catch
Cheapest commodity text generationNo single winner. Qwen3.5-Flash Global starts at $0.029 in / $0.287 out, Cohere Command R7B is $0.0375 / $0.15, and Amazon Nova Micro is $0.035 / $0.14.Google Gemini 2.5 Flash-Lite or OpenAI GPT-5.4 nano if you want a large surrounding platform, not just a cheap model.Mistral Small 4, DeepSeek, or Qwen if you keep storage and retrieval outside the model vendor.Output quality, reasoning depth, context tiers, and tool fees can erase the headline savings.
Premium reasoningOn list price, Qwen3-Max Global, xAI Grok 4.3, and Gemini 3.1 Pro Preview look far cheaper than GPT-5.5 Pro and Claude Fable/Mythos 5.OpenAI GPT-5.5 plus native tools, Gemini 3.1 Pro plus Search/Maps/File Search, or Claude Opus 4.8 plus managed-agent runtime.Anthropic Sonnet 4.6 with your own retrieval, or DeepSeek if API compatibility matters more than platform depth.This article compares cost architecture, not benchmark quality. Cheap frontier pricing is not a performance verdict.
Long-context workAnthropic Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6 keep 1M context at standard rates.Gemini 3.1 Pro if you also want grounding, Maps, multimodal tools, and Gemini-native agent features.Anthropic or Mistral with self-managed document storage.OpenAI still has explicit short/long-context price tiers; Gemini Pro tiers change above 200K input.
Search-grounded answersxAI Web Search is $5 per 1,000 calls. OpenAI and Anthropic are $10. Gemini 3 search is $14, and Gemini 2.5 grounding is $35.Google if you want Search plus Maps and file tools in one stack.Bring your own search layer, or treat search results as portable artifacts and not as provider session state.Search fees are extra. They sit on top of the model tokens.
OCR and document extractionMistral OCR 3 at $2 per 1,000 pages, or $3 per 1,000 annotated pages.OpenAI, Google, xAI, and AWS all offer document workflows, but they are usually retrieval stacks rather than plain OCR products.Mistral OCR or any transcript/OCR pipeline that returns plain text, markdown, tables, or JSON.File Search, Collections, and Knowledge Bases are not the same thing as OCR. They usually create provider-managed retrieval state.
Realtime voicexAI Realtime is $0.05 per minute, while OpenAI gpt-realtime-2 is token-priced and OpenAI realtime translate/whisper are minute-priced.OpenAI and Google both expose richer native multimodal realtime paths than the public price sheet alone suggests.Transcripts travel. Realtime session state does not.Per-minute, per-hour, and per-token realtime pricing are not directly comparable without usage assumptions.
Lowest switching painDeepSeek, Mistral small models, and Qwen are all good starting points if you keep the rest of the stack under your control.None. Managed stacks are exactly where lock-in grows.Mistral plus your own storage/vector DB, Anthropic with BYO retrieval, or DeepSeek because it supports OpenAI-style and Anthropic-style API formats.The provider model is usually not the sticky part. The sticky part is retrieval, cache, search, and runtime state.

The fastest way to misprice an LLM stack is to confuse four different things:

  1. generating text
  2. processing files
  3. retrieving from files later
  4. running the rest of the workflow around the model

Those are not the same product, and providers do not price them the same way.

The full 2026 AI token pricing comparison table

All prices below are public USD list prices. Unless otherwise noted, token prices are per 1M tokens. The Category column is there so you can filter the table by product type such as chat, voice, image, video, embeddings, OCR, or agent tooling.

ProviderCategoryModel or modePublic list priceNotes
OpenAIChatGPT-5.5 standard$5 in / $0.50 cached / $30 out per 1M tokensLong-context tier is $10 / $1 / $45.
OpenAIChatGPT-5.5 Batch/Flex$2.50 in / $0.25 cached / $15 outLong-context tier is $5 / $0.50 / $22.50.
OpenAIChatGPT-5.5 Priority$12.50 in / $1.25 cached / $75 outPremium latency mode; no separate long-context row was exposed in the current public table.
OpenAIChatGPT-5.5 Pro standard$30 in / n/a cached / $180 outLong-context tier is $60 in / $270 out.
OpenAIChatGPT-5.5 Pro Batch/Flex$15 in / n/a cached / $90 outCurrent public table exposes only the short-context Batch/Flex row.
OpenAIChatGPT-5.4 standard$2.50 in / $0.25 cached / $15 outLong-context tier is $5 / $0.50 / $22.50.
OpenAIChatGPT-5.4 Pro standard$30 in / n/a cached / $180 outLong-context tier is $60 in / $270 out. Batch/Flex is $15 / $90 short context and $30 / $135 long context.
OpenAIChatGPT-5.4 mini$0.75 in / $0.075 cached / $4.50 outBatch/Flex is $0.375 / $0.0375 / $2.25; Priority is $1.50 / $0.15 / $9.
OpenAIChatGPT-5.4 nano$0.20 in / $0.02 cached / $1.25 outCheapest current OpenAI general text model in the public pricing table. Batch/Flex is $0.10 / $0.01 / $0.625.
OpenAIVoicegpt-realtime-2Text $4 / $0.40 / $24; audio $32 / $0.40 / $64; image input $5 / $0.50Prices are per 1M tokens unless noted.
OpenAIVoicegpt-realtime-translate$0.034 per minuteMinute-priced live translation model.
OpenAIVoicegpt-realtime-whisper$0.017 per minuteMinute-priced streaming speech-to-text model.
OpenAIImageGPT Image 2Text $5 / $1.25 / n/a; image $8 / $2 / $30 per 1M tokensBatch halves image rates to $4 / $1 / $15 and text input to $2.50 / $0.625.
OpenAIImageGPT Image 1.5Text $5 / $1.25 / $10; image $8 / $2 / $32 per 1M tokensBatch halves image rates to $4 / $1 / $16 and text to $2.50 / $0.63 / $5.
OpenAIImageGPT Image 1 miniText $2 / $0.20 / n/a; image $2.50 / $0.25 / $8Batch row is $1 / $0.10 text input/cached and $1.25 / $0.13 / $4 image.
OpenAIVideoSora 2$0.10 per secondVideo generation. Not token-priced.
OpenAIVideoSora 2 Pro720p $0.30/sec; 1024x1792 $0.50/sec; 1080p $0.70/secBatch halves these rows to $0.15 / $0.25 / $0.35 per second.
OpenAIVoicegpt-4o-transcribe / diarize$2.50 in / $10 out plus about $0.006 per minuteSpeech-to-text rows are token-priced with an estimated minute cost.
OpenAIVoicegpt-4o-mini-transcribe$1.25 in / $5 out plus about $0.003 per minuteCheaper transcription tier.
OpenAIEmbeddingstext-embedding modelsCurrent public pricing page reviewed did not expose a clean embedding tableDo not carry older embedding prices into estimates without verifying the model-specific price in your account or docs.
AnthropicChatClaude Fable 5$10 in / $12.50 5m cache write / $20 1h cache write / $1 cache read / $50 out1M context stays at standard rates.
AnthropicChatClaude Mythos 5$10 / $12.50 / $20 / $1 / $50Limited availability row.
AnthropicChatClaude Opus 4.8$5 / $6.25 / $10 / $0.50 / $251M context stays at standard rates. Fast mode is $10 in / $50 out.
AnthropicChatClaude Opus 4.7$5 / $6.25 / $10 / $0.50 / $251M context stays at standard rates. Fast mode is $30 in / $150 out.
AnthropicChatClaude Opus 4.6$5 / $6.25 / $10 / $0.50 / $251M context stays at standard rates. Fast mode is $30 in / $150 out.
AnthropicChatClaude Opus 4.5$5 / $6.25 / $10 / $0.50 / $25Same base list price as Opus 4.6 and 4.7.
AnthropicChatClaude Opus 4.1$15 / $18.75 / $30 / $1.50 / $75Deprecated row still listed.
AnthropicChatClaude Opus 4$15 / $18.75 / $30 / $1.50 / $75Retired except on Vertex AI.
AnthropicChatClaude Sonnet 4.6$3 / $3.75 / $6 / $0.30 / $151M context stays at standard rates.
AnthropicChatClaude Sonnet 4.5$3 / $3.75 / $6 / $0.30 / $15Still listed.
AnthropicChatClaude Sonnet 4$3 / $3.75 / $6 / $0.30 / $15Retired except on Bedrock and Vertex AI.
AnthropicChatClaude Haiku 4.5$1 / $1.25 / $2 / $0.10 / $5Cheapest current Claude 4.x tier.
AnthropicChatClaude Haiku 3.5$0.80 / $1.00 / $1.60 / $0.08 / $4Retired except on Bedrock and Vertex AI.
GoogleChatGemini 3.5 FlashText/image/video $1.50 in; audio not separately exposed in the standard table; $9 outCache tokens $0.15, storage $1.50 per 1M tokens/hour, batch/flex $0.75 in / about $4.50 out, priority $2.70 in / $16.20 out.
GoogleVoiceGemini 3.5 Live Translate$3.50 audio in / $21 audio out per 1M tokensGoogle also presents approximate minute equivalents: $0.0053/min input and $0.0315/min output.
GoogleChatGemini 3.1 Pro Preview<=200K: $2 in / $12 out; >200K: $4 / $18Cache tokens $0.20 or $0.40; cache storage $4.50 per 1M tokens per hour; batch/flex halves input/output; Search and Maps are $14 per 1,000 queries after free tier.
GoogleChatGemini 3 Flash PreviewText/image/video $0.50 in; audio $1.00 in; $3 outCache tokens $0.05 or $0.10; storage $1 per 1M tokens per hour; batch/flex input $0.25 or $0.50 audio and output $1.50.
GoogleChatGemini 3.1 Flash-LiteText/image/video $0.25 in; audio $0.50 in; $1.50 outCache tokens $0.025 or $0.05; storage $0.50 per 1M tokens per hour; batch/flex halves input/output.
GoogleImageGemini 3.1 Flash ImageText/image input $0.50; text+thinking output $3Batch input/output is $0.25 / $1.50.
GoogleImageGemini 3 Pro ImageText/image input $2.00; text+thinking output $12Batch/flex input/output is $1 / $6; priority is $3.60 / $21.60.
GoogleChatGemini 2.5 Pro<=200K: $1.25 in / $10 out; >200K: $2.50 / $15Cache tokens $0.125 or $0.25; storage $4.50/hr; batch halves input/output; Search grounding $35 per 1,000 grounded prompts; Maps $25 per 1,000.
GoogleChatGemini 2.5 FlashText/image/video $0.30 in; audio $1.00 in; $2.50 outCache tokens $0.03 or $0.10; storage $1/hr; batch input $0.15 or $0.50 audio and output $1.25.
GoogleChatGemini 2.5 Flash-LiteText/image/video $0.10 in; audio $0.30 in; $0.40 outCache tokens $0.01 or $0.03; storage $1/hr; batch input $0.05 or $0.15 audio and output $0.20.
GoogleChatGemini 2.5 Flash-Lite PreviewSame list price as Gemini 2.5 Flash-LitePreview row still separately listed.
GoogleVoiceGemini 2.5 Flash Native AudioText input $0.50; audio/video input $3.00; text output $2.00; audio output $12.00Live API row.
GoogleImageGemini 2.5 Flash ImageText/image input $0.30; image output $0.039 per imageBatch input $0.15; image output $0.0195 per image.
GoogleVoiceGemini 2.5 Flash Preview TTSText input $0.50; audio output $10Batch $0.25 in / $5 out.
GoogleVoiceGemini 2.5 Pro Preview TTSText input $1.00; audio output $20Batch $0.50 in / $10 out.
GoogleAgentGemini Robotics-ER 1.6 PreviewText/image/video $1.00 in; audio $2.00 in; $5.00 outBatch row is $0.50 or $1.00 audio input and $2.50 output. Search grounding is $14 per 1,000 search queries after free tier.
GoogleAgentGemini 2.5 Computer Use Preview<=200K: $1.25 in / $10 out; >200K: $2.50 / $15Specialized computer-use SKU.
GoogleChatGemini 2.0 FlashText/image/video $0.10 in; audio $0.70 in; $0.40 outDeprecated, shutdown June 1, 2026. Batch $0.05 or $0.35 audio and $0.20 output.
GoogleChatGemini 2.0 Flash-Lite$0.075 in / $0.30 outDeprecated, shutdown June 1, 2026. Batch $0.0375 / $0.15.
GoogleEmbeddingsGemini Embedding 2Text $0.20 per 1M tokens; image $0.45; audio $6.50; video $12.00Batch halves each modality. Google also gives image/audio/video equivalent units.
GoogleEmbeddingsGemini Embedding 001$0.15 per 1M text tokensBatch $0.075.
GoogleImageImagen 4 Fast$0.02 per imageImage generation.
GoogleImageImagen 4 Standard$0.04 per imageImage generation.
GoogleImageImagen 4 Ultra$0.06 per imageImage generation.
GoogleVideoVeo 3.1 Standard720p/1080p $0.40/sec; 4K $0.60/secVideo generation.
GoogleVideoVeo 3.1 Fast720p $0.10/sec; 1080p $0.12/sec; 4K $0.30/secCheaper video tier.
GoogleVideoVeo 3.1 Lite720p $0.05/sec; 1080p $0.08/secLowest current Veo 3.1 public row.
GoogleVideoVeo 3 Standard$0.40 per secondOlder video generation row.
GoogleVideoVeo 3 Fast720p $0.10/sec; 1080p $0.12/sec; 4K $0.30/secOlder fast video row.
GoogleVideoVeo 2$0.35 per secondOlder video generation row.
GoogleChatGemma 4No paid list price in the captured pricing page; free-tier rows onlyImportant strategically, but not a normal paid API row in the page reviewed.
Amazon NovaChatNova Micro$0.035 in / $0.14 outThe cheapest clearly exposed Nova text tier in the official public pricing snippets.
Amazon NovaChatNova Lite$0.06 in / $0.24 outLow-cost general Nova tier.
Amazon NovaChatNova Pro$0.80 in / $3.20 outBase Pro tier.
Amazon NovaChatNova Pro latency optimized$1.00 in / $4.00 outLower latency premium mode.
Amazon NovaChatNova 2 LitePricing page snippet showed $0.30 input / $2.50 outputThe public static view did not expose the full labeled row cleanly enough for a more detailed reproduction.
Amazon NovaChatNova 2 Pro PreviewPricing page snippet showed a row beginning with repeated $2.1875 values and $17.50 outputThe static public view did not expose the table labels cleanly enough to quote the full row with confidence.
Amazon NovaChatNova 2 Omni PreviewProduct listed; public static docs view did not expose a clean list priceExists on the Nova model catalog page.
Amazon NovaVoiceNova 2 SonicProduct listed; public static docs view did not expose a clean list priceSpeech model family.
Amazon NovaChatNova PremierProduct listed; public static docs view did not expose a clean list priceMost capable multimodal understanding model.
Amazon NovaImageNova CanvasProduct listed; public static docs view did not expose a clean list priceImage generation.
Amazon NovaVideoNova ReelProduct listed; public static docs view did not expose a clean list priceVideo generation.
Amazon NovaEmbeddingsNova Multimodal EmbeddingsProduct listed; public static docs view did not expose a clean list priceEmbedding model.
Amazon NovaAgentNova Act$4.75 per agent-hourBrowser automation / computer-use workflow pricing.
Amazon NovaToolingNova ForgeAnnual subscription; public price not disclosed on the public pageModel-building service, not token-priced.
Amazon NovaToolingBedrock Prompt Optimization$0.03 per 1,000 tokensThis is a platform-side add-on, not a model.
xAIChatGrok 4.3$1.25 in / $0.20 cached / $2.50 outCurrent main chat row; the pricing page also maps current Grok 4.20 rows to the same list price.
xAIAgentGrok 4.20 Multi Agent 0309$1.25 in / $0.20 cached / $2.50 outSeparate current pricing row with same token rates.
xAIChatGrok 4.20 0309 reasoning / non-reasoning$1.25 in / $0.20 cached / $2.50 outBoth rows are listed at the same public token price.
xAICodingGrok Build 0.1 / Grok Code Fast aliases$1.00 in / $0.20 cached / $2.00 outCode-specialized current row.
xAIVoiceRealtime API$0.05 per minute, or $3 per hourRealtime voice billing line.
xAIVoiceText to speech$15 per 1M charactersSpeech generation.
xAIVoiceSpeech to text$0.10/hour REST; $0.20/hour streamingSpeech-to-text uses hourly units, not tokens.
xAIImageGrok Imagine Image$0.002 per input image; output starts at $0.02 per imageQuality mode lists $0.01 input image and $0.05-$0.07 output image depending on resolution.
xAIVideoGrok Imagine VideoInput $0.01/sec or $0.002/image; output starts at $0.05/secVideo 1.5 lists $0.01 input image and $0.08-$0.14/sec output depending on resolution.
MistralChatMistral Large 3$0.50 in / $1.50 outOpen-weight model; strong portability story.
MistralChatMistral Medium 3.5$1.50 in / $7.50 outCurrent Medium row on Mistral's pricing page.
MistralChatMistral Small 4$0.10 in / $0.30 outCheap general text tier.
MistralChatMinistral 33B $0.10 / $0.10; 8B $0.15 / $0.15; 14B $0.20 / $0.20Edge/lightweight family; prices are input/output per 1M tokens.
MistralChatDevstral Small 2$0.10 in / $0.30 outCode-oriented small model.
MistralChatDevstral 2$0.40 in / $2.00 outLarger open-weight coding/agentic row.
MistralChatCodestral$0.30 in / $0.90 outCode model.
MistralVoiceVoxtral Small$0.004 per audio minute; $0.10 text input / $0.40 outputAudio-input model with both minute and token pricing exposed.
MistralVoiceVoxtral TTS$0.016 per 1,000 charactersText-to-speech generation.
MistralEmbeddingsMistral Embed$0.10 per 1M tokensText embeddings.
MistralEmbeddingsCodestral Embed$0.15 per 1M tokensCode embeddings.
MistralOCROCR 3$2 per 1,000 pagesPortable document extraction product.
MistralOCROCR 3 Annotated$3 per 1,000 pagesAnnotated OCR output.
CohereChatCommand A$2.50 in / $10.00 outFlagship enterprise text model.
CohereChatCommand A ReasoningModel is live; the public docs view I could verify did not expose a stable paid list pricePublic docs list the model but not a clean price row in the static view I reviewed.
CohereMultimodalCommand A VisionModel is live; the public docs view I could verify did not expose a stable paid list priceMultimodal Command variant.
CohereChatCommand R$0.15 in / $0.60 outVery competitive general text tier.
CohereChatCommand R7B$0.0375 in / $0.15 outOne of the cheapest named enterprise text models in public docs.
CohereChatCommand R+ 08-2024$2.50 in / $10.00 outOlder refresh row documented in Cohere's changelog. Treat as legacy unless the live model catalog still lists it for your account.
CohereRerankRerank modelsQuery-priced rather than token-pricedThe public docs explain the unit, but the static docs view I reviewed did not surface a stable public dollar figure on Cohere's own site.
CohereEmbeddingsEmbed modelsToken-priced rather than generation-pricedThe public docs explain the pricing model, but the static docs view I reviewed did not surface a stable public dollar figure on Cohere's own site.
DeepSeekChatdeepseek-v4-flash / deepseek-chat / deepseek-reasoner$0.14 cache-miss in / $0.0028 cache-hit in / $0.28 outCompatibility aliases map to DeepSeek-V4-Flash modes and are scheduled for deprecation on 2026-07-24 15:59 UTC.
DeepSeekChatdeepseek-v4-pro$0.435 cache-miss in / $0.003625 cache-hit in / $0.87 outCurrent public V4-Pro row.
Qwen / Alibaba CloudChatQwen3-Max GlobalStarts at $0.359 in / $1.434 outTiered by context: 0-32K, 32K-128K, 128K-252K. Batch and cache discounts available.
Qwen / Alibaba CloudChatQwen3-Max InternationalStarts at $1.20 in / $6.00 outHigher-priced deployment mode.
Qwen / Alibaba CloudChatQwen3-Max Chinese MainlandStarts at $0.359 in / $1.434 outMinima match the captured mainland row.
Qwen / Alibaba CloudChatQwen3.5-Plus GlobalStarts at $0.115 in / $0.688 outDeployment mode and context tier both matter.
Qwen / Alibaba CloudChatQwen3.5-Plus InternationalStarts at $0.40 in / $2.40 out0-256K row shown in the public pricing page.
Qwen / Alibaba CloudChatQwen3.5-Flash GlobalStarts at $0.029 in / $0.287 outTiered by context: 0-128K, 128K-256K, 256K-1M. Batch and cache discounts available.
Qwen / Alibaba CloudChatQwen3.5-Flash InternationalStarts at $0.10 in / $0.40 outHigher-priced deployment mode.
Qwen / Alibaba CloudChatQwen-Plus USStarts at $0.40 in / $1.20 out in non-thinking modeThe public page also shows a much more expensive thinking mode, but the static capture did not expose the entire row cleanly enough to reproduce every number with confidence.
Qwen / Alibaba CloudChatQwen-Flash USStarts at $0.05 in / $0.40 outTiered by context: 0-256K and 256K-1M. Batch is half price where supported.
Qwen / Alibaba CloudChatQwen3-Max snapshotsOlder snapshot rows are more expensive than the current qwen3-maxExamples on the public page include qwen3-max-2025-09-23 and qwen3-max-preview.
Qwen / Alibaba CloudMultimodalQwen-VL / QVQ / Qwen-Omni / Qwen-Omni-RealtimeCommercial multimodal families are publicly listed; the static page excerpts I reviewed did not expose all of their numeric price rowsDo not assume the text-model prices apply to multimodal SKUs.

A few patterns jump out immediately.

First, the cheapest raw text generation is not concentrated in one camp. Qwen3.5-Flash Global pushes the input floor very low. Cohere Command R7B and Amazon Nova Micro are also extremely cheap on output. Mistral Small 4 and Devstral Small 2 are still low enough to matter, and Gemini 2.5 Flash-Lite remains a serious budget option. DeepSeek stays attractive because its current price sheet is simple, cheap, and aggressively cached.

Second, premium reasoning is now surprisingly heterogeneous. Anthropic Fable 5 and Mythos 5 are expensive. OpenAI GPT-5.5 Pro and GPT-5.4 Pro are much more expensive again. Gemini 3.1 Pro Preview and xAI Grok 4.3 look cheaper on list price, and Qwen3-Max looks cheaper still in the captured Global mode rows. That does not settle the capability question, but it absolutely changes the budget conversation.

Third, "the cheapest provider" is often not the provider with the cheapest whole system. Search-grounded applications, document-heavy workflows, and agentic runtimes can shift the ranking once you add tool pricing, retrieval fees, or runtime charges. The next section shows one of the clearest examples: file handling and retrieval.

Why file processing is often where the lock-in starts

Providers often blur three very different offerings under the phrase "work with your files":

  • file parsing or OCR
  • embeddings plus retrieval
  • provider-managed file search or knowledge-base state

Those have very different portability properties.

If you pay for OCR and receive plain text, markdown, tables, or JSON, you can usually send that output to any other model later. That is portable value.

If you pay for embeddings and keep the vectors in your own vector database, that is partly portable. The data can move, but in practice many teams re-embed when they switch retrieval models so query and document vectors stay in the same space.

If you pay for a provider-managed file search layer, collections product, or knowledge-base service, the useful thing you are buying is no longer just the file. You are buying provider-owned retrieval state. Your PDF may be portable, but the managed index, cache, search path, and tool wiring are not.

That distinction matters more than most "token pricing" articles admit.

OpenAI

OpenAI is no longer just a model vendor. It is a managed runtime that layers retrieval, search, image generation, video generation, containers, speech, embeddings, and file infrastructure around the core inference API. That convenience is part of the appeal, but it also means the real cost of an OpenAI build can drift well beyond the headline model rate.

The biggest OpenAI pricing gotcha in 2026 is still long context. The current public sheet exposes separate short-context and long-context rates for the GPT-5.5 and GPT-5.4 families. That is a real pricing tier, not a minor footnote.

OpenAI's public pricing still includes older and specialized rows. The tables below focus on the current GPT-5.5/GPT-5.4 stack plus the image, speech, and tool rows that matter most for real bill estimation.

OpenAI text models and purchase modes

SKUModeInputCached inputOutputNotes
GPT-5.5Standard short context$5.00$0.50$30.00Current flagship row.
GPT-5.5Standard long context$10.00$1.00$45.00Long-context tier.
GPT-5.5Batch/Flex short context$2.50$0.25$15.00Half of standard short-context pricing.
GPT-5.5Batch/Flex long context$5.00$0.50$22.50Half of standard long-context pricing.
GPT-5.5Priority$12.50$1.25$75.00Premium latency mode.
GPT-5.5 ProStandard short context$30.00n/a$180.00Highest-end OpenAI text tier.
GPT-5.5 ProStandard long context$60.00n/a$270.00Long-context tier.
GPT-5.5 ProBatch/Flex short context$15.00n/a$90.00Current table does not expose a separate long-context Batch/Flex row.
GPT-5.4Standard short context$2.50$0.25$15.00Previous flagship row, still priced.
GPT-5.4Standard long context$5.00$0.50$22.50Long-context tier.
GPT-5.4 ProStandard short context$30.00n/a$180.00Previous Pro row, still priced.
GPT-5.4 ProStandard long context$60.00n/a$270.00Long-context tier.
GPT-5.4 ProBatch/Flex short context$15.00n/a$90.00Half of standard short-context pricing.
GPT-5.4 ProBatch/Flex long context$30.00n/a$135.00Half of standard long-context input; output shown as $135.
GPT-5.4 miniStandard$0.75$0.075$4.50Batch/Flex is $0.375 / $0.0375 / $2.25.
GPT-5.4 nanoStandard$0.20$0.02$1.25Cheapest current OpenAI general text row in the public table. Batch/Flex is $0.10 / $0.01 / $0.625.

OpenAI realtime, audio, and speech

SKUText inputText cachedText outputAudio inputAudio cachedAudio outputImage inputImage cachedNotes
gpt-realtime-2$4.00$0.40$24.00$32.00$0.40$64.00$5.00$0.50Prices are per 1M tokens.
gpt-realtime-translaten/an/an/an/an/a$0.034 per minuten/an/aMinute-priced live translation.
gpt-realtime-whispern/an/an/an/an/a$0.017 per minuten/an/aMinute-priced streaming transcription.
SKUPriceNotes
gpt-4o-transcribe / diarize$2.50 input / $10 output per 1M tokens; about $0.006 per minuteSpeech to text.
gpt-4o-mini-transcribe$1.25 input / $5 output per 1M tokens; about $0.003 per minuteCheaper speech to text.

OpenAI image, video, and embeddings

SKUPriceNotes
GPT Image 2Text $5 / $1.25 / n/a; image $8 / $2 / $30 per 1M tokensBatch halves image token output to $15.
GPT Image 1.5Text $5 / $1.25 / $10; image $8 / $2 / $32Batch halves image token output to $16.
GPT Image 1 miniText $2 / $0.20 / n/a; image $2.50 / $0.25 / $8Batch halves image token output to $4.
Sora 2$0.10 per secondVideo generation.
Sora 2 Pro720p $0.30/sec; 1024x1792 $0.50/sec; 1080p $0.70/secPremium video generation.
text-embedding modelsCurrent public pricing page reviewed did not expose a clean embedding tableVerify model-specific embedding pricing in the docs or account console before carrying older prices into an estimate.

If you are comparing dedicated image and video generation APIs rather than broader model stacks, Direct Provider Pricing for Image + Video Generation APIs (USD) breaks out that market separately.

OpenAI tool and platform charges

ServicePricePortable output?
Regional/data-residency endpoints for eligible models released on or after 2026-03-05+10% upliftRegional deployment choice, not a portable artifact.
ContainersPer 20-minute session: 1GB $0.03; 4GB $0.12; 16GB $0.48; 64GB $1.92Generated files can be exported; the runtime state is OpenAI-only.
File Search storage$0.10 per GB per day, first GB freeYour source files and extracted outputs can travel; the hosted vector store cannot.
File Search tool call$2.50 per 1,000 callsCalls use OpenAI-managed retrieval state.
Agent Kit file and image upload storage$0.10 per GB-day after 1 GB free per account per monthStorage can be exported only if you also keep your own copy of the files.
Web Search toolStandard/reasoning preview $10 per 1,000 calls plus search content tokens; non-reasoning preview $25 per 1,000 calls with free search content tokensReturned URLs/snippets are portable; the tool and session state are OpenAI-only.
ModerationFreeYes, as a classification result.

OpenAI portability read

OpenAI is easy to adopt and easy to deepen. The tradeoff is that switching pain grows fastest when File Search, Web Search, containers, and other hosted runtime layers become central to the product.

Anthropic

Anthropic currently has the cleanest cache math in the market. It tells you exactly what cache writes cost, what cache reads cost, and how batch pricing halves the bill. It also has one of the cleanest long-context stories among premium vendors, because Fable 5, Mythos 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6 keep 1M context at standard rates.

Anthropic models

SKUInput5m cache write1h cache writeCache readOutputBatch inputBatch outputNotes
Fable 5$10.00$12.50$20.00$1.00$50.00$5.00$25.001M context stays at standard rates.
Mythos 5$10.00$12.50$20.00$1.00$50.00$5.00$25.00Limited availability.
Opus 4.8$5.00$6.25$10.00$0.50$25.00$2.50$12.501M context stays at standard rates; fast mode is $10 in / $50 out.
Opus 4.7$5.00$6.25$10.00$0.50$25.00$2.50$12.501M context stays at standard rates; fast mode is $30 in / $150 out.
Opus 4.6$5.00$6.25$10.00$0.50$25.00$2.50$12.501M context stays at standard rates; fast mode is $30 in / $150 out.
Opus 4.5$5.00$6.25$10.00$0.50$25.00$2.50$12.50Same base price as Opus 4.6 and 4.7.
Opus 4.1$15.00$18.75$30.00$1.50$75.00$7.50$37.50Deprecated.
Opus 4$15.00$18.75$30.00$1.50$75.00$7.50$37.50Retired except on Vertex AI.
Sonnet 4.6$3.00$3.75$6.00$0.30$15.00$1.50$7.501M context stays at standard rates.
Sonnet 4.5$3.00$3.75$6.00$0.30$15.00$1.50$7.50Still listed.
Sonnet 4$3.00$3.75$6.00$0.30$15.00$1.50$7.50Retired except on Bedrock and Vertex AI.
Haiku 4.5$1.00$1.25$2.00$0.10$5.00$0.50$2.50Cheapest Claude 4.x tier.
Haiku 3.5$0.80$1.00$1.60$0.08$4.00$0.40$2.00Retired except on Bedrock and Vertex AI.

Anthropic tools and runtime costs

ServicePricePortability
Prompt caching5-minute writes are 1.25x base input; 1-hour writes are 2x; reads are 0.1xProvider-only cache state.
US-only inference on Claude API1.1x multiplier on all token categories for Opus 4.6 and newerRegional deployment choice, not a portable artifact.
Regional endpoints on Bedrock/Vertex+10% on current 4.5-family regional endpointsCloud-channel choice, not a portable artifact.
Web Search$10 per 1,000 searches plus normal token chargesReturned sources travel; the tool/session state does not.
Web FetchNo additional line-item charge beyond normal token usageFetched text/output is portable.
Code executionFree when used with web search or web fetch; otherwise first 1,550 hours per org per month free, then $0.05 per hour per container, 5-minute minimumGenerated files travel; container state does not.
Claude Managed Agents session runtime$0.08 per session-hourRuntime state is Anthropic-managed; outputs can be stored elsewhere.
Bash tool overheadAdds 245 input tokens for the tool definitionThe overhead is Claude-specific.
Text editor overheadAdds 700 input tokens for the tool definitionThe overhead is Claude-specific.
Computer use overheadAdds roughly 466-499 tokens to the system prompt, 735 input tokens per tool definition, plus screenshot vision tokensThe session/runtime state is provider-only.

Anthropic portability read

Anthropic stays relatively portable if you keep retrieval outside Claude and use the API mainly for reasoning and orchestration. The sticky pieces are cache, container state, and computer-use runtime.

Google Gemini API

Google is the clearest example of low model list prices living inside a large ecosystem. Gemini 2.5 Flash-Lite is still cheap at the model layer, while Gemini 3.1 Pro and Gemini 3.5 Flash are now the rows to watch higher up the stack. Google separately monetizes cache storage, grounding, embeddings, multimodal outputs, and file-search economics. Another important detail: Gemini output pricing explicitly includes thinking tokens, which makes direct output-token comparisons less clean.

Google core text and reasoning models

SKUStandard inputStandard outputCache token priceCache storageBatchNotes
Gemini 3.5 FlashText/image/video $1.50$9.00$0.15 text/img/video$1.50/hrBatch/Flex $0.75 in / $4.50 outPriority is $2.70 in / $16.20 out. Output pricing includes thinking tokens.
Gemini 3.1 Pro Preview<=200K $2.00; >200K $4.00<=200K $12.00; >200K $18.00$0.20 or $0.40$4.50 per 1M tokens/hourBatch/Flex input/output halvedSearch and Maps are $14 per 1,000 queries after free tier. Output pricing includes thinking tokens.
Gemini 3 Flash PreviewText/image/video $0.50; audio $1.00$3.00$0.05 text/img/video; $0.10 audio$1.00/hrBatch/Flex input $0.25 or $0.50 audio; output $1.50Output pricing includes thinking tokens.
Gemini 3.1 Flash-LiteText/image/video $0.25; audio $0.50$1.50$0.025 text/img/video; $0.05 audio$0.50/hrBatch/Flex input $0.125 or $0.25 audio; output $0.75Priority is $0.45 input and $2.70 output.
Gemini 2.5 Pro<=200K $1.25; >200K $2.50<=200K $10.00; >200K $15.00$0.125 or $0.25$4.50/hrInput/output halvedSearch grounding $35 per 1,000 grounded prompts; Maps $25 per 1,000. Output pricing includes thinking tokens.
Gemini 2.5 FlashText/image/video $0.30; audio $1.00$2.50$0.03 text/img/video; $0.10 audio$1.00/hrInput $0.15 or $0.50 audio; output $1.25Output pricing includes thinking tokens.
Gemini 2.5 Flash-LiteText/image/video $0.10; audio $0.30$0.40$0.01 text/img/video; $0.03 audio$1.00/hrInput $0.05 or $0.15 audio; output $0.20Output pricing includes thinking tokens.
Gemini 2.5 Flash-Lite PreviewSame as 2.5 Flash-LiteSame as 2.5 Flash-LiteSame as 2.5 Flash-Lite$1.00/hrSame as 2.5 Flash-LitePreview row still listed separately.
Gemini Robotics-ER 1.6 PreviewText/image/video $1.00; audio $2.00$5.00n/an/aBatch $0.50 or $1 audio in / $2.50 outSearch grounding is $14 per 1,000 search queries after free tier.
Gemini 2.5 Computer Use Preview<=200K $1.25; >200K $2.50<=200K $10.00; >200K $15.00n/an/an/aSpecialized computer-use row.
Gemini 2.0 FlashText/image/video $0.10; audio $0.70$0.40$0.025 text/img/video; $0.175 audio$1.00/hrInput $0.05 or $0.35 audio; output $0.20Deprecated. Shutdown June 1, 2026.
Gemini 2.0 Flash-Lite$0.075$0.30n/an/a$0.0375 in / $0.15 outDeprecated. Shutdown June 1, 2026.
Gemma 4Free-tier-only rows in the captured pricing pageFree-tier-only rows in the captured pricing pagen/an/an/aImportant open model family, but not shown as a normal paid Developer API row in the pricing page reviewed.

Google image, audio, TTS, and video

SKUPriceNotes
Gemini 3.1 Flash ImageText/image input $0.50; text+thinking output $3Batch row is $0.25 in / $1.50 out.
Gemini 3 Pro ImageText/image input $2.00; text+thinking output $12Batch/Flex row is $1 in / $6 out; Priority is $3.60 in / $21.60 out.
Gemini 3.5 Live TranslateAudio input $3.50; audio output $21.00Google also shows approximate minute equivalents of $0.0053/min input and $0.0315/min output.
Gemini 2.5 Flash Native AudioText input $0.50; audio/video input $3.00; text output $2.00; audio output $12.00Live API native audio row.
Gemini 2.5 Flash ImageText/image input $0.30; image output $0.039 per imageBatch input $0.15; image output $0.0195 per image.
Gemini 2.5 Flash Preview TTSText input $0.50; audio output $10.00Batch $0.25 in / $5 out.
Gemini 2.5 Pro Preview TTSText input $1.00; audio output $20.00Batch $0.50 in / $10 out.
Imagen 4 Fast$0.02 per imageImage generation.
Imagen 4 Standard$0.04 per imageImage generation.
Imagen 4 Ultra$0.06 per imageImage generation.
Veo 3.1 Standard720p/1080p $0.40 per second; 4K $0.60 per secondVideo generation.
Veo 3.1 Fast720p $0.10/sec; 1080p $0.12/sec; 4K $0.30/secCheaper video tier.
Veo 3.1 Lite720p $0.05/sec; 1080p $0.08/secLowest current Veo 3.1 public row.
Veo 3 Standard$0.40 per secondOlder video row still listed.
Veo 3 Fast720p $0.10/sec; 1080p $0.12/sec; 4K $0.30/secOlder fast video row still listed.
Veo 2$0.35 per secondOlder video row still listed.

Google embeddings and tools

ServicePricePortability
Gemini Embedding 2Text $0.20 per 1M tokens; image $0.45; audio $6.50; video $12.00Batch halves these prices. Embeddings can be exported, but in practice you usually re-embed when you switch retrieval models.
Gemini Embedding 001$0.15 per 1M text tokens; batch $0.075Same caveat: vectors travel, but retrieval quality usually wants matching query/document models.
Google Search groundingGemini 2.5: $35 per 1,000 grounded prompts after free tier; Gemini 3: $14 per 1,000 search queries after free tierReturned URLs/snippets travel; grounding service and session state do not.
Google Maps grounding$25 per 1,000 grounded prompts after free tierReturned data is reusable; the service itself is Google-only.
Code executionNo separate runtime charge; billed only as normal model tokensGenerated files/results can travel; runtime state does not.
URL contextNo separate tool fee; billed as normal input tokensFetched text/output is portable.
File SearchTool line itself is free, but embeddings are billed and retrieved document tokens are billed as model tokensYour files travel; the hosted retrieval layer does not.

Google portability read

Gemini itself is price-competitive. The switching cost rises when Search, Maps, cache storage, and File Search become part of the core workflow instead of just optional tools.

Amazon Nova and Bedrock-side economics

Amazon Nova can look extremely cheap on the pure model rows, especially Nova Micro and Nova Lite. In practice, though, you are buying Bedrock economics: model inference plus whatever you choose from Prompt Optimization, Guardrails, Knowledge Bases, Data Automation, browser automation, and other AWS-shaped platform features. If your data, IAM, networking, and operations already live in AWS, that integration value may dominate list-price differences.

Amazon Nova models

SKUPublic list priceConfidenceNotes
Nova Micro$0.035 in / $0.14 outHighClearly exposed in official public pricing snippets.
Nova Lite$0.06 in / $0.24 outHighClearly exposed in official public pricing snippets.
Nova Pro$0.80 in / $3.20 outHighClearly exposed in official public pricing snippets.
Nova Pro latency optimized$1.00 in / $4.00 outHighClearly exposed in official public pricing snippets.
Nova 2 LitePricing page snippet showed $0.30 input / $2.50 outputMediumThe public static page did not expose the labeled row cleanly enough for more detail.
Nova 2 Pro PreviewPricing page snippet showed a row starting with repeated $2.1875 values and $17.50 outputLowI am not reproducing the row as fully verified because the table labels were not visible in the static capture.
Nova 2 Omni PreviewNot cleanly exposed in the public static docs viewLowProduct exists on the model catalog page.
Nova 2 SonicNot cleanly exposed in the public static docs viewLowSpeech model family.
Nova PremierNot cleanly exposed in the public static docs viewLowMost capable multimodal understanding model.
Nova CanvasNot cleanly exposed in the public static docs viewLowImage generation model.
Nova ReelNot cleanly exposed in the public static docs viewLowVideo generation model.
Nova Multimodal EmbeddingsNot cleanly exposed in the public static docs viewLowEmbedding model.

AWS platform-side charges around Nova

ServicePricePortability
Nova Act$4.75 per agent-hourGenerated browser outputs can be exported; the workflow runtime is AWS-specific.
Nova ForgeAnnual subscription; public price not disclosedModel-building workflow is provider-specific.
Bedrock Prompt Optimization$0.03 per 1,000 tokensPrompt suggestions can travel; the service itself is AWS-specific.
Knowledge Bases / Guardrails / Data AutomationSeparate Bedrock charges exist, but the static public excerpts I captured did not expose every current numeric row cleanlyThese services are AWS-shaped platform features, not portable assets.

Amazon Nova portability read

Nova can be cheap at the model layer, but the real lock-in lives in the Bedrock wrapper: Knowledge Bases, Guardrails, routing, optimization, and agent/browser runtime state.

xAI

xAI is more cost-competitive than many people assume, especially if you separate model cost from ecosystem branding. The current public model page has a simpler shape than older Grok 4.20-era pricing: Grok 4.3 is $1.25 in / $2.50 out, Grok Build 0.1 is $1 in / $2 out, and the pricing payload also maps current Grok 4.20 rows to $1.25 in / $0.20 cached / $2.50 out. xAI's tool charges are unusually clear. The key distinction is between portable one-shot outputs and xAI-managed retrieval state such as Collections.

xAI models and modes

SKUInputCached inputOutputNotes
Grok 4.3$1.25$0.20$2.50Current main chat row. The public model page exposes input and output; the embedded pricing payload exposes cached input.
Grok 4.20 0309 reasoning / non-reasoning$1.25$0.20$2.50Current pricing payload rows list both modes at the same rate.
Grok 4.20 Multi Agent 0309$1.25$0.20$2.50Separate multi-agent row with the same token rates.
Grok Build 0.1 / Grok Code Fast aliases$1.00$0.20$2.00Current coding row.
Realtime APIn/an/a$0.05 per minuteAlso shown as $3 per hour.
Text to speechn/an/a$15 per 1M charactersSpeech generation.
Speech to textn/an/a$0.10/hour REST; $0.20/hour streamingSpeech-to-text uses hourly units, not tokens.
Grok Imagine Image$0.002 per input image; quality mode $0.01 per input imagen/aOutput starts at $0.02 per image; quality mode $0.05-$0.07 per imageImage generation units are per image, not per token.
Grok Imagine VideoInput $0.01/sec or $0.002/imagen/aOutput starts at $0.05-$0.07/sec; Video 1.5 is $0.08-$0.14/secVideo generation units are per second or per image, depending on mode.

xAI tools

ServicePricePortability
Batch API50% discount across input, output, cached input, and reasoning tokens for supported text/language modelsBatch queue is provider-only; outputs are portable.
Web Search$5 per 1,000 callsReturned links/snippets are portable; search tool state is xAI-only.
X Search$5 per 1,000 callsReturned content is portable; the service is xAI/X-specific.
Code Execution$5 per 1,000 callsGenerated files/results travel; runtime state does not.
File Attachments search$10 per 1,000 callsYour files travel; attachment search state is provider-specific.
Collections Search$2.50 per 1,000 callsCollections are xAI-managed knowledge bases and are not cross-provider assets.
view_image / video toolsToken-basedResults travel; the tool path does not.

xAI portability read

xAI is attractive for cheap search-grounded or tool-heavy experimentation. It becomes much less portable once Collections and native search workflows move from peripheral features to core product state.

Mistral

Mistral has the strongest portability story in this comparison because it repeatedly sells reusable artifacts: OCR output, embeddings, and open-weight models. You can prototype on hosted inference, then self-host or move across clouds later without redesigning the whole stack around a proprietary retrieval product.

Mistral hosted SKUs and utilities

SKUInputOutputNotes
Mistral Large 3$0.50$1.50Flagship open-weight model.
Mistral Medium 3.5$1.50$7.50Current Medium row on Mistral's pricing page.
Mistral Small 4$0.10$0.30Cheap general text model.
Ministral 33B $0.10; 8B $0.15; 14B $0.203B $0.10; 8B $0.15; 14B $0.20Lightweight family; prices are input/output per 1M tokens.
Devstral Small 2$0.10$0.30Cheap code-oriented model.
Devstral 2$0.40$2.00Larger open-weight coding/agentic model.
Codestral$0.30$0.90Code model.
Voxtral Small$0.004 per audio minute; $0.10 text input$0.40Audio-input model with both minute and token pricing exposed.
Voxtral TTS$0.016 per 1,000 charactersn/aText-to-speech generation.
Mistral Embed$0.10 per 1M tokensn/aText embeddings.
Codestral Embed$0.15 per 1M tokensn/aCode embeddings.
OCR 3$2 per 1,000 pagesn/aPortable OCR / document extraction.
OCR 3 Annotated$3 per 1,000 pagesn/aAnnotated OCR output.

Mistral portability read

Mistral is the cleanest escape-hatch story in this market. If low medium-term switching cost matters most, it is one of the strongest answers.

Cohere

Cohere remains interesting for enterprise text and retrieval-heavy work. Command A is not the cheapest flagship, but Command R and Command R7B are aggressively priced, and Cohere's broader product story still centers on retrieval, control, and private deployment rather than consumer-platform sprawl. The main caveat here is pricing visibility for some non-generation SKUs, so those rows stay conservative.

Cohere current public rows

SKUInputOutputNotes
Command A$2.50$10.00Current flagship enterprise text model.
Command A ReasoningNot cleanly exposed in the public static docs viewNot cleanly exposed in the public static docs viewPublic model exists; I did not verify a stable paid list price row in the static view.
Command A VisionNot cleanly exposed in the public static docs viewNot cleanly exposed in the public static docs viewPublic model exists; pricing row was not exposed in the snippets I could verify.
Command R$0.15$0.60Competitive general text tier.
Command R7B$0.0375$0.15Very low cost named enterprise model.
Command R+ 08-2024$2.50$10.00Older refresh row documented in Cohere's changelog. Treat as legacy unless your account still exposes it.
Rerank modelsQuery-pricedn/aCohere's public docs explain the pricing model, but the static public view I reviewed did not surface a stable current dollar figure on Cohere's own site.
Embed modelsToken-pricedn/aSame caveat: pricing model is public, but the static docs view I reviewed did not surface a stable current dollar figure on Cohere's own site.

Cohere portability read

Cohere is relatively portable if you keep your own vector database or use private deployment. Its stickier value is in managed enterprise deployment and private hosting, not a giant public tooling layer.

DeepSeek

DeepSeek has one of the simplest serious price sheets in the market. It currently exposes a very low cache-hit price, a still-low cache-miss input price, and a low output price across both the non-thinking and thinking aliases. The pricing page is refreshingly direct.

The more strategically important point is API compatibility. DeepSeek explicitly supports OpenAI-compatible and Anthropic-compatible formats, which can lower migration cost even beyond the headline token savings.

DeepSeek current public rows

SKUCache-miss inputCache-hit inputOutputNotes
deepseek-v4-flash / deepseek-chat / deepseek-reasoner$0.14$0.0028$0.28Compatibility aliases map to DeepSeek-V4-Flash non-thinking and thinking modes.
deepseek-v4-pro$0.435$0.003625$0.87Current public V4-Pro row.

DeepSeek portability read

The cache is provider-specific, but the overall stack is unusually portable because the API surface mirrors other major ecosystems.

Qwen via Alibaba Cloud Model Studio

Qwen can look astonishingly cheap, but it is easy to read the pricing page incorrectly. Deployment mode, region, context tier, and thinking mode all affect the bill. There is no single "Qwen price"; there are several Qwen price surfaces, and the cheapest headline row usually belongs to a specific deployment mode and context bracket. That complexity does not make Qwen unattractive, but it does mean you have to read the table carefully.

Qwen deployment-mode overview

Deployment modeQwen3-MaxQwen3.5-PlusQwen3.5-FlashNotes
InternationalStarts at $1.20 / $6.00Starts at $0.40 / $2.40Starts at $0.10 / $0.40Higher-priced global commercial tier.
GlobalStarts at $0.359 / $1.434Starts at $0.115 / $0.688Starts at $0.029 / $0.287Cheapest captured mode for current flagship rows.
USn/a in captured rowsQwen-Plus starts at $0.40 / $1.20Qwen-Flash starts at $0.05 / $0.40US uses qwen-plus and qwen-flash naming in the captured rows.
Chinese MainlandStarts at $0.359 / $1.434Starts at $0.115 / $0.688Starts at $0.029 / $0.287Current mainland minima match the captured rows.

Qwen detailed tiered rows captured from the public pricing page

SKUTierInputOutputNotes
qwen3-max / qwen3-max-2026-01-230-32K$0.359$1.434Global row.
qwen3-max / qwen3-max-2026-01-2332K-128K$0.574$2.294Global row.
qwen3-max / qwen3-max-2026-01-23128K-252K$1.004$4.014Global row.
qwen3-max-2025-09-23 / qwen3-max-preview0-32K$0.861$3.441Older, more expensive snapshot/previews.
qwen3-max-2025-09-23 / qwen3-max-preview32K-128K$1.434$5.735Older, more expensive snapshot/previews.
qwen3-max-2025-09-23 / qwen3-max-preview128K-252K$2.151$8.602Older, more expensive snapshot/previews.
qwen3.5-flash0-128K$0.029$0.287Global row. Supports batch and context cache discounts.
qwen3.5-flash128K-256K$0.115$1.147Global row.
qwen3.5-flash256K-1M$0.172$1.720Global row.
qwen-flash-us0-256K$0.05$0.40US row. Supports batch at half price.
qwen-flash-us256K-1M$0.25$2.00US row. Supports batch at half price.
qwen3.5-plus0-256K$0.40$2.40International row.
qwen3.5-plus256K-1M$0.50$3.00International row.
qwen-plus non-thinking0-256K$0.40$1.20International row.
qwen-plus non-thinking256K-1M$1.20$3.60International row.
qwen-plus thinkingTieredMuch higher than non-thinking modeThe public static capture did not expose the full row cleanly enough to reproduce every number with confidenceThinking mode pricing exists and is materially higher.

Commercial multimodal Qwen families listed without a clean numeric row in the captured static view

FamilyStatusPrice visibilityPortability
Qwen-VLCommercial multimodal family is publicly listedThe static excerpts I reviewed did not expose a clean numeric rowModel outputs travel; Model Studio deployment economics do not.
QVQCommercial multimodal family is publicly listedThe static excerpts I reviewed did not expose a clean numeric rowSame caveat.
Qwen-OmniCommercial multimodal family is publicly listedThe static excerpts I reviewed did not expose a clean numeric rowSame caveat.
Qwen-Omni-RealtimeCommercial multimodal family is publicly listedThe static excerpts I reviewed did not expose a clean numeric rowSame caveat.

Qwen portability read

Qwen is attractive if you want a very cheap hosted commercial tier now without giving up the broader strategic value of an open-source/open-weight family later.

A note on Meta and the Llama API

Meta's Llama ecosystem is strategically important and absolutely belongs in any serious conversation about portability, self-hosting, and open-weight alternatives. But I am not putting a numeric Llama API row into the main price comparison because, in the public docs view I could verify today, I could not confirm a stable public pay-as-you-go token sheet accessible without preview gating or login. That is an honesty choice, not an omission by accident.

What can you actually reuse across providers?

This is the part many teams get wrong: in practice, they are buying a mix of portable artifacts and provider-only state.

Provider-by-provider portability map

ProviderOutputs that travel wellProvider-bound services
OpenAIText, JSON, code, transcripts, generated files, original uploaded filesFile Search vector stores, Web Search session state, Containers, response/thread/runtime state
AnthropicText, JSON, tool results, fetched web content, generated filesPrompt cache, code execution container state, computer-use runtime state
GoogleText, grounded links/snippets, generated files, original uploaded filesSearch grounding session state, Maps grounding, cache storage, File Search retrieval state
Amazon Nova / BedrockText, generated assets, source filesKnowledge Bases, Guardrails, routing, prompt optimization workflow, browser/agent runtimes
xAIText, JSON, code, generated files, source attachmentsCollections, attachment search state, tool/runtime state
MistralText, OCR output, embeddings, open-weight checkpoints, source filesVery little compared with the rest; the main lock-in comes only if you let Mistral own retrieval instead of you
CohereText, rerank scores, embeddings, private deployment artifactsManaged public API usage is portable enough; Model Vault and private deployment reduce lock-in further
DeepSeekText, JSON, source files, cached prompt-independent outputsPrompt cache is provider-specific, but the API surface is unusually portable because it mirrors other ecosystems
Qwen / Alibaba CloudText, JSON, open-source/open-weight family artifacts, source filesModel Studio deployment mode economics, cache, and batch plumbing are Alibaba-specific

What moves cleanly, what does not

Artifact or serviceCross-provider?WhyExamples
Raw outputs: text, JSON, codeYesThey are just artifacts once you store them outside the vendor.Any chat/completion output from any provider.
Source files: PDFs, CSVs, images, audioYesKeep the originals in your own storage.The same PDF can be sent to OpenAI, Anthropic, Google, Mistral, or xAI.
OCR and transcriptsYesPlain text, markdown, tables, and JSON are portable.Mistral OCR output; OpenAI or Google transcripts; Whisper transcripts.
Embeddings and vectorsPartlyYou can export vectors, but in practice you usually re-embed when you switch retrieval models so query and document embeddings stay aligned.Gemini Embedding, Mistral Embed, Cohere Embed, OpenAI embedding models.
Prompt cacheNoCache state is defined by the provider's own runtime and token accounting.OpenAI cached input, Anthropic cache write/read, Google cache storage, DeepSeek cache-hit pricing.
Hosted file search / managed vector storesNoYou can move the source files, not the hosted index as a portable service.OpenAI File Search, Google File Search, xAI Collections, Bedrock Knowledge Bases.
Grounded web or map search resultsPartlyReturned links/snippets are reusable, but the tool path and session state are provider-specific.OpenAI Web Search, Anthropic Web Search, Google Search/Maps grounding, xAI Web Search/X Search.
Code execution and containersNo, except for exported outputsThe runtime state lives inside the provider.OpenAI Containers, Anthropic code execution, Google code execution, xAI code execution, Nova Act workflows.
Open-weight models and self-hosted checkpointsYesYou can move them across clouds or self-host them.Mistral open-weight models, the wider Qwen family, Meta Llama families.
API client compatibilityPartlyIf a provider mirrors another provider's API shape, migration is easier even if the model itself is different.DeepSeek supports OpenAI-compatible and Anthropic-compatible formats.

That table gives you a simple rule of thumb:

  • source artifacts usually travel
  • final outputs usually travel
  • provider-managed retrieval usually does not
  • prompt caches do not
  • runtimes and containers do not
  • embeddings are only partly portable in practice
  • open-weight checkpoints are the strongest long-term escape hatch

A simple example: search-grounded pricing can invert the ranking

If your workflow calls search on every turn, the search line itself can dominate the bill surprisingly quickly.

ProviderSearch tool line onlyWhat that means
xAI$5 per 1,000 callsCheapest clearly exposed first-party search tool fee in this review.
OpenAI$10 per 1,000 calls for standard/reasoning preview modes, plus search content tokensThe tool line is moderate, but tokens still apply.
Anthropic$10 per 1,000 searches, plus normal tokensVery similar headline tool fee to OpenAI.
Google Gemini 3$14 per 1,000 search queries after free tierCheaper than Gemini 2.5 grounding, pricier than xAI and OpenAI/Anthropic.
Google Gemini 2.5$35 per 1,000 grounded prompts after free tierGrounding is meaningfully more expensive here than on Gemini 3.

That is before you add the model tokens. A provider that looks cheap on generation alone can become expensive in a search-heavy product, and a provider that looks expensive on generation alone can become reasonable if its tool line is cheaper or more predictable for your workload.

The practical buying guide

Here is the cleanest way to think about the market right now.

If you want the cheapest text possible

Start with the low-cost rows that are genuinely public and current: Qwen3.5-Flash Global, Cohere Command R7B, Amazon Nova Micro, Mistral Small 4, Devstral Small 2, Gemini 2.5 Flash-Lite, and DeepSeek. Then test the output quality you actually need. The interesting part is no longer finding a cheap model. It is finding a cheap model that does not force you into expensive surrounding services later.

If you want premium reasoning without instantly paying premium-platform tax

Gemini 3.1 Pro Preview, Grok 4.3, and Qwen3-Max all look more price-aggressive on list price than the top Anthropic and OpenAI premium rows. That does not tell you which one is best for your tasks. It tells you where to begin if you want to avoid assuming the most visible premium models are also the most financially sensible.

If you want the easiest long-context economics

Anthropic is the cleanest answer in the public sheets I reviewed because Fable 5, Mythos 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6 keep 1M context at standard rates. OpenAI and Google can still be excellent choices, but their price jumps are explicit enough that you should model them before you architect around giant prompts.

If you want the lowest future switching pain

Mistral plus your own storage and retrieval stack is probably the cleanest answer. Anthropic with self-managed retrieval is also good. DeepSeek is especially interesting if you already have OpenAI-flavored or Anthropic-flavored client code and want lower migration friction. The deeper you go into File Search, Collections, Knowledge Bases, cache storage, containers, or provider-native search/grounding, the less meaningful model-layer portability becomes.

Common questions about AI token pricing

What actually determines the real cost of an LLM stack in 2026?

In this comparison, the real cost is a cost shape, not just a token rate. Input tokens, output tokens, cached tokens, cache writes, cache storage, batch discounts, long-context repricing, search-grounding fees, hosted retrieval fees, document processing charges, and runtime costs can all change the final bill materially.

Is headline token price enough to compare providers?

No. Headline token pricing still matters, but on its own it is incomplete. Search, retrieval, file processing, runtime, and provider-owned state can all change which stack is actually cheaper for a real workload.

Which providers have the lowest future switching pain?

The shortlist in this article is Mistral plus your own storage and retrieval stack, Anthropic with self-managed retrieval, and DeepSeek if API compatibility matters to you. In each case, lower switching pain comes from keeping retrieval and long-lived state under your control.

Are embeddings and managed file search portable across providers?

Not in the same way. Source files, OCR output, transcripts, and final outputs travel well. Embeddings are only partly portable in practice because many teams re-embed when they switch retrieval models. Managed file search, hosted vector stores, caches, and runtime state are provider-specific.

Bottom line

By 2026, "token price" on its own is an incomplete buying metric. What matters is the full cost shape: model price, cache behavior, long-context behavior, tool pricing, retrieval pricing, file-processing pricing, runtime pricing, and ultimately how much provider-owned state your application accumulates.

If you care most about the lowest raw list price, the market now gives you several clear places to start testing. If you care most about premium managed stacks, OpenAI, Google, AWS, and xAI offer the deepest native ecosystems. If you care most about reducing future switching pain, Mistral, Anthropic with BYO retrieval, and DeepSeek remain the most strategically interesting combinations.

The most honest way to compare providers in 2026 is not to ask which model is cheapest. It is to ask which full cost shape you are actually willing to buy.

Official sources checked

00
Previous

Previous article

Why Google Discover Keeps Ignoring Your Site: The 2026 Technical Playbook for Publishers Who Want Real Traffic

Next article

Anthropic Was Right. Here's Why the Pentagon's AI Power Grab Should Terrify Every Engineer in This Industry.

Next

Explore the tools or browse interactive maps for more experiments.

Back to Blog Posts