The real cost of generative AI tokens in 2026: the full provider-by-provider comparison
Most generative AI pricing roundups make the same mistake: they compare one flagship model per provider, quote a single input/output token rate, and treat that as the real bill.
By 2026, that is no longer enough. The real bill is a layered cost shape: input tokens, output tokens, cached tokens, cache writes, cache storage, batch discounts, long-context tier jumps, search-grounding fees, hosted retrieval fees, OCR or document parsing fees, runtime charges, and sometimes separate browser automation or container pricing. This article compares the market the way buyers actually experience it: provider by provider, model by model, and mode by mode, while also separating portable value from provider-owned state.
Table of contents
- Methodology and scope
- What "mode" means in this article
- Quick answers by purpose
- The full 2026 AI token pricing comparison table
- Why file processing is often where the lock-in starts
- OpenAI
- Anthropic
- Google Gemini API
- Amazon Nova and Bedrock-side economics
- xAI
- Mistral
- Cohere
- DeepSeek
- Qwen via Alibaba Cloud Model Studio
- A note on Meta and the Llama API
- What can you actually reuse across providers?
- A simple example: search-grounded pricing can invert the ranking
- The practical buying guide
- Common questions about AI token pricing
- Bottom line
- Official sources checked
Methodology and scope
A few ground rules matter for the rest of the comparison:
- Verified facts: every numeric row below comes from official provider pricing documentation checked on 2026-06-17, unless a row explicitly says that the price was not cleanly exposed in the public static docs view.
- Unit convention: token prices are USD per 1M tokens unless the row explicitly says per minute, per hour, per image, per second, per page, per 1,000 calls, or another non-token unit. Cache-hit, cache-write, and cache-storage prices are labeled separately because they are not interchangeable with normal input tokens.
- Widely accepted expert consensus: total LLM cost is driven by more than plain input/output tokens. Caching, long context, retrieval, search, and runtime state change the economics materially.
- Informed inference: the portability and lock-in judgments are mine. They are not vendor claims.
- Speculation: none.
More specifically, this comparison covers current billable hosted SKUs and public priced modes from OpenAI, Anthropic, Google Gemini API, Amazon Nova, xAI, Mistral, Cohere, DeepSeek, and Qwen via Alibaba Cloud Model Studio. I am deliberately not mixing in Azure OpenAI, Vertex partner listings, or Bedrock resales of third-party models, because those are channel decisions layered on top of the underlying model economics.
I also need to be explicit about two boundaries. First, I am not exploding every historical alias, retired snapshot, or open-weight download artifact unless the provider still exposes it as a billable public row. Second, parts of the AWS Nova pricing page and parts of xAI's pricing matrix render dynamically. Where the official public page or official search snippet exposed a clean number, I include it. Where it did not, I mark the gap instead of guessing. Later sections keep those confidence notes short for the same reason.
Finally, this is a pricing-architecture article, not a benchmark ranking. A cheaper model is not automatically a better value for your use case, and an expensive model is not automatically overpriced. What follows is a cost comparison, not a capability leaderboard.
What "mode" means in this article
A provider no longer sells just "a model." It often sells several pricing modes around the same model:
- standard vs batch/flex vs priority
- reasoning vs non-reasoning
- short-context vs long-context tier
- text vs realtime vs audio vs image vs video
- preview/beta vs general availability
- self-contained inference vs provider-native tools such as search, file search, collections, code execution, or browser automation
That is why the tables below are wider than the usual blog post table. If you compress all of that into one number, you are not doing a comparison. You are doing marketing.
Quick answers by purpose
If you only need the conclusions, start here.
| Purpose | Best raw price floor | Best managed stack | Best portability | Main catch |
|---|---|---|---|---|
| Cheapest commodity text generation | No single winner. Qwen3.5-Flash Global starts at $0.029 in / $0.287 out, Cohere Command R7B is $0.0375 / $0.15, and Amazon Nova Micro is $0.035 / $0.14. | Google Gemini 2.5 Flash-Lite or OpenAI GPT-5.4 nano if you want a large surrounding platform, not just a cheap model. | Mistral Small 4, DeepSeek, or Qwen if you keep storage and retrieval outside the model vendor. | Output quality, reasoning depth, context tiers, and tool fees can erase the headline savings. |
| Premium reasoning | On list price, Qwen3-Max Global, xAI Grok 4.3, and Gemini 3.1 Pro Preview look far cheaper than GPT-5.5 Pro and Claude Fable/Mythos 5. | OpenAI GPT-5.5 plus native tools, Gemini 3.1 Pro plus Search/Maps/File Search, or Claude Opus 4.8 plus managed-agent runtime. | Anthropic Sonnet 4.6 with your own retrieval, or DeepSeek if API compatibility matters more than platform depth. | This article compares cost architecture, not benchmark quality. Cheap frontier pricing is not a performance verdict. |
| Long-context work | Anthropic Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6 keep 1M context at standard rates. | Gemini 3.1 Pro if you also want grounding, Maps, multimodal tools, and Gemini-native agent features. | Anthropic or Mistral with self-managed document storage. | OpenAI still has explicit short/long-context price tiers; Gemini Pro tiers change above 200K input. |
| Search-grounded answers | xAI Web Search is $5 per 1,000 calls. OpenAI and Anthropic are $10. Gemini 3 search is $14, and Gemini 2.5 grounding is $35. | Google if you want Search plus Maps and file tools in one stack. | Bring your own search layer, or treat search results as portable artifacts and not as provider session state. | Search fees are extra. They sit on top of the model tokens. |
| OCR and document extraction | Mistral OCR 3 at $2 per 1,000 pages, or $3 per 1,000 annotated pages. | OpenAI, Google, xAI, and AWS all offer document workflows, but they are usually retrieval stacks rather than plain OCR products. | Mistral OCR or any transcript/OCR pipeline that returns plain text, markdown, tables, or JSON. | File Search, Collections, and Knowledge Bases are not the same thing as OCR. They usually create provider-managed retrieval state. |
| Realtime voice | xAI Realtime is $0.05 per minute, while OpenAI gpt-realtime-2 is token-priced and OpenAI realtime translate/whisper are minute-priced. | OpenAI and Google both expose richer native multimodal realtime paths than the public price sheet alone suggests. | Transcripts travel. Realtime session state does not. | Per-minute, per-hour, and per-token realtime pricing are not directly comparable without usage assumptions. |
| Lowest switching pain | DeepSeek, Mistral small models, and Qwen are all good starting points if you keep the rest of the stack under your control. | None. Managed stacks are exactly where lock-in grows. | Mistral plus your own storage/vector DB, Anthropic with BYO retrieval, or DeepSeek because it supports OpenAI-style and Anthropic-style API formats. | The provider model is usually not the sticky part. The sticky part is retrieval, cache, search, and runtime state. |
The fastest way to misprice an LLM stack is to confuse four different things:
- generating text
- processing files
- retrieving from files later
- running the rest of the workflow around the model
Those are not the same product, and providers do not price them the same way.
The full 2026 AI token pricing comparison table
All prices below are public USD list prices. Unless otherwise noted, token prices are per 1M tokens.
The Category column is there so you can filter the table by product type such as chat, voice, image, video, embeddings, OCR, or agent tooling.
| Provider | Category | Model or mode | Public list price | Notes |
|---|---|---|---|---|
| OpenAI | Chat | GPT-5.5 standard | $5 in / $0.50 cached / $30 out per 1M tokens | Long-context tier is $10 / $1 / $45. |
| OpenAI | Chat | GPT-5.5 Batch/Flex | $2.50 in / $0.25 cached / $15 out | Long-context tier is $5 / $0.50 / $22.50. |
| OpenAI | Chat | GPT-5.5 Priority | $12.50 in / $1.25 cached / $75 out | Premium latency mode; no separate long-context row was exposed in the current public table. |
| OpenAI | Chat | GPT-5.5 Pro standard | $30 in / n/a cached / $180 out | Long-context tier is $60 in / $270 out. |
| OpenAI | Chat | GPT-5.5 Pro Batch/Flex | $15 in / n/a cached / $90 out | Current public table exposes only the short-context Batch/Flex row. |
| OpenAI | Chat | GPT-5.4 standard | $2.50 in / $0.25 cached / $15 out | Long-context tier is $5 / $0.50 / $22.50. |
| OpenAI | Chat | GPT-5.4 Pro standard | $30 in / n/a cached / $180 out | Long-context tier is $60 in / $270 out. Batch/Flex is $15 / $90 short context and $30 / $135 long context. |
| OpenAI | Chat | GPT-5.4 mini | $0.75 in / $0.075 cached / $4.50 out | Batch/Flex is $0.375 / $0.0375 / $2.25; Priority is $1.50 / $0.15 / $9. |
| OpenAI | Chat | GPT-5.4 nano | $0.20 in / $0.02 cached / $1.25 out | Cheapest current OpenAI general text model in the public pricing table. Batch/Flex is $0.10 / $0.01 / $0.625. |
| OpenAI | Voice | gpt-realtime-2 | Text $4 / $0.40 / $24; audio $32 / $0.40 / $64; image input $5 / $0.50 | Prices are per 1M tokens unless noted. |
| OpenAI | Voice | gpt-realtime-translate | $0.034 per minute | Minute-priced live translation model. |
| OpenAI | Voice | gpt-realtime-whisper | $0.017 per minute | Minute-priced streaming speech-to-text model. |
| OpenAI | Image | GPT Image 2 | Text $5 / $1.25 / n/a; image $8 / $2 / $30 per 1M tokens | Batch halves image rates to $4 / $1 / $15 and text input to $2.50 / $0.625. |
| OpenAI | Image | GPT Image 1.5 | Text $5 / $1.25 / $10; image $8 / $2 / $32 per 1M tokens | Batch halves image rates to $4 / $1 / $16 and text to $2.50 / $0.63 / $5. |
| OpenAI | Image | GPT Image 1 mini | Text $2 / $0.20 / n/a; image $2.50 / $0.25 / $8 | Batch row is $1 / $0.10 text input/cached and $1.25 / $0.13 / $4 image. |
| OpenAI | Video | Sora 2 | $0.10 per second | Video generation. Not token-priced. |
| OpenAI | Video | Sora 2 Pro | 720p $0.30/sec; 1024x1792 $0.50/sec; 1080p $0.70/sec | Batch halves these rows to $0.15 / $0.25 / $0.35 per second. |
| OpenAI | Voice | gpt-4o-transcribe / diarize | $2.50 in / $10 out plus about $0.006 per minute | Speech-to-text rows are token-priced with an estimated minute cost. |
| OpenAI | Voice | gpt-4o-mini-transcribe | $1.25 in / $5 out plus about $0.003 per minute | Cheaper transcription tier. |
| OpenAI | Embeddings | text-embedding models | Current public pricing page reviewed did not expose a clean embedding table | Do not carry older embedding prices into estimates without verifying the model-specific price in your account or docs. |
| Anthropic | Chat | Claude Fable 5 | $10 in / $12.50 5m cache write / $20 1h cache write / $1 cache read / $50 out | 1M context stays at standard rates. |
| Anthropic | Chat | Claude Mythos 5 | $10 / $12.50 / $20 / $1 / $50 | Limited availability row. |
| Anthropic | Chat | Claude Opus 4.8 | $5 / $6.25 / $10 / $0.50 / $25 | 1M context stays at standard rates. Fast mode is $10 in / $50 out. |
| Anthropic | Chat | Claude Opus 4.7 | $5 / $6.25 / $10 / $0.50 / $25 | 1M context stays at standard rates. Fast mode is $30 in / $150 out. |
| Anthropic | Chat | Claude Opus 4.6 | $5 / $6.25 / $10 / $0.50 / $25 | 1M context stays at standard rates. Fast mode is $30 in / $150 out. |
| Anthropic | Chat | Claude Opus 4.5 | $5 / $6.25 / $10 / $0.50 / $25 | Same base list price as Opus 4.6 and 4.7. |
| Anthropic | Chat | Claude Opus 4.1 | $15 / $18.75 / $30 / $1.50 / $75 | Deprecated row still listed. |
| Anthropic | Chat | Claude Opus 4 | $15 / $18.75 / $30 / $1.50 / $75 | Retired except on Vertex AI. |
| Anthropic | Chat | Claude Sonnet 4.6 | $3 / $3.75 / $6 / $0.30 / $15 | 1M context stays at standard rates. |
| Anthropic | Chat | Claude Sonnet 4.5 | $3 / $3.75 / $6 / $0.30 / $15 | Still listed. |
| Anthropic | Chat | Claude Sonnet 4 | $3 / $3.75 / $6 / $0.30 / $15 | Retired except on Bedrock and Vertex AI. |
| Anthropic | Chat | Claude Haiku 4.5 | $1 / $1.25 / $2 / $0.10 / $5 | Cheapest current Claude 4.x tier. |
| Anthropic | Chat | Claude Haiku 3.5 | $0.80 / $1.00 / $1.60 / $0.08 / $4 | Retired except on Bedrock and Vertex AI. |
| Chat | Gemini 3.5 Flash | Text/image/video $1.50 in; audio not separately exposed in the standard table; $9 out | Cache tokens $0.15, storage $1.50 per 1M tokens/hour, batch/flex $0.75 in / about $4.50 out, priority $2.70 in / $16.20 out. | |
| Voice | Gemini 3.5 Live Translate | $3.50 audio in / $21 audio out per 1M tokens | Google also presents approximate minute equivalents: $0.0053/min input and $0.0315/min output. | |
| Chat | Gemini 3.1 Pro Preview | <=200K: $2 in / $12 out; >200K: $4 / $18 | Cache tokens $0.20 or $0.40; cache storage $4.50 per 1M tokens per hour; batch/flex halves input/output; Search and Maps are $14 per 1,000 queries after free tier. | |
| Chat | Gemini 3 Flash Preview | Text/image/video $0.50 in; audio $1.00 in; $3 out | Cache tokens $0.05 or $0.10; storage $1 per 1M tokens per hour; batch/flex input $0.25 or $0.50 audio and output $1.50. | |
| Chat | Gemini 3.1 Flash-Lite | Text/image/video $0.25 in; audio $0.50 in; $1.50 out | Cache tokens $0.025 or $0.05; storage $0.50 per 1M tokens per hour; batch/flex halves input/output. | |
| Image | Gemini 3.1 Flash Image | Text/image input $0.50; text+thinking output $3 | Batch input/output is $0.25 / $1.50. | |
| Image | Gemini 3 Pro Image | Text/image input $2.00; text+thinking output $12 | Batch/flex input/output is $1 / $6; priority is $3.60 / $21.60. | |
| Chat | Gemini 2.5 Pro | <=200K: $1.25 in / $10 out; >200K: $2.50 / $15 | Cache tokens $0.125 or $0.25; storage $4.50/hr; batch halves input/output; Search grounding $35 per 1,000 grounded prompts; Maps $25 per 1,000. | |
| Chat | Gemini 2.5 Flash | Text/image/video $0.30 in; audio $1.00 in; $2.50 out | Cache tokens $0.03 or $0.10; storage $1/hr; batch input $0.15 or $0.50 audio and output $1.25. | |
| Chat | Gemini 2.5 Flash-Lite | Text/image/video $0.10 in; audio $0.30 in; $0.40 out | Cache tokens $0.01 or $0.03; storage $1/hr; batch input $0.05 or $0.15 audio and output $0.20. | |
| Chat | Gemini 2.5 Flash-Lite Preview | Same list price as Gemini 2.5 Flash-Lite | Preview row still separately listed. | |
| Voice | Gemini 2.5 Flash Native Audio | Text input $0.50; audio/video input $3.00; text output $2.00; audio output $12.00 | Live API row. | |
| Image | Gemini 2.5 Flash Image | Text/image input $0.30; image output $0.039 per image | Batch input $0.15; image output $0.0195 per image. | |
| Voice | Gemini 2.5 Flash Preview TTS | Text input $0.50; audio output $10 | Batch $0.25 in / $5 out. | |
| Voice | Gemini 2.5 Pro Preview TTS | Text input $1.00; audio output $20 | Batch $0.50 in / $10 out. | |
| Agent | Gemini Robotics-ER 1.6 Preview | Text/image/video $1.00 in; audio $2.00 in; $5.00 out | Batch row is $0.50 or $1.00 audio input and $2.50 output. Search grounding is $14 per 1,000 search queries after free tier. | |
| Agent | Gemini 2.5 Computer Use Preview | <=200K: $1.25 in / $10 out; >200K: $2.50 / $15 | Specialized computer-use SKU. | |
| Chat | Gemini 2.0 Flash | Text/image/video $0.10 in; audio $0.70 in; $0.40 out | Deprecated, shutdown June 1, 2026. Batch $0.05 or $0.35 audio and $0.20 output. | |
| Chat | Gemini 2.0 Flash-Lite | $0.075 in / $0.30 out | Deprecated, shutdown June 1, 2026. Batch $0.0375 / $0.15. | |
| Embeddings | Gemini Embedding 2 | Text $0.20 per 1M tokens; image $0.45; audio $6.50; video $12.00 | Batch halves each modality. Google also gives image/audio/video equivalent units. | |
| Embeddings | Gemini Embedding 001 | $0.15 per 1M text tokens | Batch $0.075. | |
| Image | Imagen 4 Fast | $0.02 per image | Image generation. | |
| Image | Imagen 4 Standard | $0.04 per image | Image generation. | |
| Image | Imagen 4 Ultra | $0.06 per image | Image generation. | |
| Video | Veo 3.1 Standard | 720p/1080p $0.40/sec; 4K $0.60/sec | Video generation. | |
| Video | Veo 3.1 Fast | 720p $0.10/sec; 1080p $0.12/sec; 4K $0.30/sec | Cheaper video tier. | |
| Video | Veo 3.1 Lite | 720p $0.05/sec; 1080p $0.08/sec | Lowest current Veo 3.1 public row. | |
| Video | Veo 3 Standard | $0.40 per second | Older video generation row. | |
| Video | Veo 3 Fast | 720p $0.10/sec; 1080p $0.12/sec; 4K $0.30/sec | Older fast video row. | |
| Video | Veo 2 | $0.35 per second | Older video generation row. | |
| Chat | Gemma 4 | No paid list price in the captured pricing page; free-tier rows only | Important strategically, but not a normal paid API row in the page reviewed. | |
| Amazon Nova | Chat | Nova Micro | $0.035 in / $0.14 out | The cheapest clearly exposed Nova text tier in the official public pricing snippets. |
| Amazon Nova | Chat | Nova Lite | $0.06 in / $0.24 out | Low-cost general Nova tier. |
| Amazon Nova | Chat | Nova Pro | $0.80 in / $3.20 out | Base Pro tier. |
| Amazon Nova | Chat | Nova Pro latency optimized | $1.00 in / $4.00 out | Lower latency premium mode. |
| Amazon Nova | Chat | Nova 2 Lite | Pricing page snippet showed $0.30 input / $2.50 output | The public static view did not expose the full labeled row cleanly enough for a more detailed reproduction. |
| Amazon Nova | Chat | Nova 2 Pro Preview | Pricing page snippet showed a row beginning with repeated $2.1875 values and $17.50 output | The static public view did not expose the table labels cleanly enough to quote the full row with confidence. |
| Amazon Nova | Chat | Nova 2 Omni Preview | Product listed; public static docs view did not expose a clean list price | Exists on the Nova model catalog page. |
| Amazon Nova | Voice | Nova 2 Sonic | Product listed; public static docs view did not expose a clean list price | Speech model family. |
| Amazon Nova | Chat | Nova Premier | Product listed; public static docs view did not expose a clean list price | Most capable multimodal understanding model. |
| Amazon Nova | Image | Nova Canvas | Product listed; public static docs view did not expose a clean list price | Image generation. |
| Amazon Nova | Video | Nova Reel | Product listed; public static docs view did not expose a clean list price | Video generation. |
| Amazon Nova | Embeddings | Nova Multimodal Embeddings | Product listed; public static docs view did not expose a clean list price | Embedding model. |
| Amazon Nova | Agent | Nova Act | $4.75 per agent-hour | Browser automation / computer-use workflow pricing. |
| Amazon Nova | Tooling | Nova Forge | Annual subscription; public price not disclosed on the public page | Model-building service, not token-priced. |
| Amazon Nova | Tooling | Bedrock Prompt Optimization | $0.03 per 1,000 tokens | This is a platform-side add-on, not a model. |
| xAI | Chat | Grok 4.3 | $1.25 in / $0.20 cached / $2.50 out | Current main chat row; the pricing page also maps current Grok 4.20 rows to the same list price. |
| xAI | Agent | Grok 4.20 Multi Agent 0309 | $1.25 in / $0.20 cached / $2.50 out | Separate current pricing row with same token rates. |
| xAI | Chat | Grok 4.20 0309 reasoning / non-reasoning | $1.25 in / $0.20 cached / $2.50 out | Both rows are listed at the same public token price. |
| xAI | Coding | Grok Build 0.1 / Grok Code Fast aliases | $1.00 in / $0.20 cached / $2.00 out | Code-specialized current row. |
| xAI | Voice | Realtime API | $0.05 per minute, or $3 per hour | Realtime voice billing line. |
| xAI | Voice | Text to speech | $15 per 1M characters | Speech generation. |
| xAI | Voice | Speech to text | $0.10/hour REST; $0.20/hour streaming | Speech-to-text uses hourly units, not tokens. |
| xAI | Image | Grok Imagine Image | $0.002 per input image; output starts at $0.02 per image | Quality mode lists $0.01 input image and $0.05-$0.07 output image depending on resolution. |
| xAI | Video | Grok Imagine Video | Input $0.01/sec or $0.002/image; output starts at $0.05/sec | Video 1.5 lists $0.01 input image and $0.08-$0.14/sec output depending on resolution. |
| Mistral | Chat | Mistral Large 3 | $0.50 in / $1.50 out | Open-weight model; strong portability story. |
| Mistral | Chat | Mistral Medium 3.5 | $1.50 in / $7.50 out | Current Medium row on Mistral's pricing page. |
| Mistral | Chat | Mistral Small 4 | $0.10 in / $0.30 out | Cheap general text tier. |
| Mistral | Chat | Ministral 3 | 3B $0.10 / $0.10; 8B $0.15 / $0.15; 14B $0.20 / $0.20 | Edge/lightweight family; prices are input/output per 1M tokens. |
| Mistral | Chat | Devstral Small 2 | $0.10 in / $0.30 out | Code-oriented small model. |
| Mistral | Chat | Devstral 2 | $0.40 in / $2.00 out | Larger open-weight coding/agentic row. |
| Mistral | Chat | Codestral | $0.30 in / $0.90 out | Code model. |
| Mistral | Voice | Voxtral Small | $0.004 per audio minute; $0.10 text input / $0.40 output | Audio-input model with both minute and token pricing exposed. |
| Mistral | Voice | Voxtral TTS | $0.016 per 1,000 characters | Text-to-speech generation. |
| Mistral | Embeddings | Mistral Embed | $0.10 per 1M tokens | Text embeddings. |
| Mistral | Embeddings | Codestral Embed | $0.15 per 1M tokens | Code embeddings. |
| Mistral | OCR | OCR 3 | $2 per 1,000 pages | Portable document extraction product. |
| Mistral | OCR | OCR 3 Annotated | $3 per 1,000 pages | Annotated OCR output. |
| Cohere | Chat | Command A | $2.50 in / $10.00 out | Flagship enterprise text model. |
| Cohere | Chat | Command A Reasoning | Model is live; the public docs view I could verify did not expose a stable paid list price | Public docs list the model but not a clean price row in the static view I reviewed. |
| Cohere | Multimodal | Command A Vision | Model is live; the public docs view I could verify did not expose a stable paid list price | Multimodal Command variant. |
| Cohere | Chat | Command R | $0.15 in / $0.60 out | Very competitive general text tier. |
| Cohere | Chat | Command R7B | $0.0375 in / $0.15 out | One of the cheapest named enterprise text models in public docs. |
| Cohere | Chat | Command R+ 08-2024 | $2.50 in / $10.00 out | Older refresh row documented in Cohere's changelog. Treat as legacy unless the live model catalog still lists it for your account. |
| Cohere | Rerank | Rerank models | Query-priced rather than token-priced | The public docs explain the unit, but the static docs view I reviewed did not surface a stable public dollar figure on Cohere's own site. |
| Cohere | Embeddings | Embed models | Token-priced rather than generation-priced | The public docs explain the pricing model, but the static docs view I reviewed did not surface a stable public dollar figure on Cohere's own site. |
| DeepSeek | Chat | deepseek-v4-flash / deepseek-chat / deepseek-reasoner | $0.14 cache-miss in / $0.0028 cache-hit in / $0.28 out | Compatibility aliases map to DeepSeek-V4-Flash modes and are scheduled for deprecation on 2026-07-24 15:59 UTC. |
| DeepSeek | Chat | deepseek-v4-pro | $0.435 cache-miss in / $0.003625 cache-hit in / $0.87 out | Current public V4-Pro row. |
| Qwen / Alibaba Cloud | Chat | Qwen3-Max Global | Starts at $0.359 in / $1.434 out | Tiered by context: 0-32K, 32K-128K, 128K-252K. Batch and cache discounts available. |
| Qwen / Alibaba Cloud | Chat | Qwen3-Max International | Starts at $1.20 in / $6.00 out | Higher-priced deployment mode. |
| Qwen / Alibaba Cloud | Chat | Qwen3-Max Chinese Mainland | Starts at $0.359 in / $1.434 out | Minima match the captured mainland row. |
| Qwen / Alibaba Cloud | Chat | Qwen3.5-Plus Global | Starts at $0.115 in / $0.688 out | Deployment mode and context tier both matter. |
| Qwen / Alibaba Cloud | Chat | Qwen3.5-Plus International | Starts at $0.40 in / $2.40 out | 0-256K row shown in the public pricing page. |
| Qwen / Alibaba Cloud | Chat | Qwen3.5-Flash Global | Starts at $0.029 in / $0.287 out | Tiered by context: 0-128K, 128K-256K, 256K-1M. Batch and cache discounts available. |
| Qwen / Alibaba Cloud | Chat | Qwen3.5-Flash International | Starts at $0.10 in / $0.40 out | Higher-priced deployment mode. |
| Qwen / Alibaba Cloud | Chat | Qwen-Plus US | Starts at $0.40 in / $1.20 out in non-thinking mode | The public page also shows a much more expensive thinking mode, but the static capture did not expose the entire row cleanly enough to reproduce every number with confidence. |
| Qwen / Alibaba Cloud | Chat | Qwen-Flash US | Starts at $0.05 in / $0.40 out | Tiered by context: 0-256K and 256K-1M. Batch is half price where supported. |
| Qwen / Alibaba Cloud | Chat | Qwen3-Max snapshots | Older snapshot rows are more expensive than the current qwen3-max | Examples on the public page include qwen3-max-2025-09-23 and qwen3-max-preview. |
| Qwen / Alibaba Cloud | Multimodal | Qwen-VL / QVQ / Qwen-Omni / Qwen-Omni-Realtime | Commercial multimodal families are publicly listed; the static page excerpts I reviewed did not expose all of their numeric price rows | Do not assume the text-model prices apply to multimodal SKUs. |
A few patterns jump out immediately.
First, the cheapest raw text generation is not concentrated in one camp. Qwen3.5-Flash Global pushes the input floor very low. Cohere Command R7B and Amazon Nova Micro are also extremely cheap on output. Mistral Small 4 and Devstral Small 2 are still low enough to matter, and Gemini 2.5 Flash-Lite remains a serious budget option. DeepSeek stays attractive because its current price sheet is simple, cheap, and aggressively cached.
Second, premium reasoning is now surprisingly heterogeneous. Anthropic Fable 5 and Mythos 5 are expensive. OpenAI GPT-5.5 Pro and GPT-5.4 Pro are much more expensive again. Gemini 3.1 Pro Preview and xAI Grok 4.3 look cheaper on list price, and Qwen3-Max looks cheaper still in the captured Global mode rows. That does not settle the capability question, but it absolutely changes the budget conversation.
Third, "the cheapest provider" is often not the provider with the cheapest whole system. Search-grounded applications, document-heavy workflows, and agentic runtimes can shift the ranking once you add tool pricing, retrieval fees, or runtime charges. The next section shows one of the clearest examples: file handling and retrieval.
Why file processing is often where the lock-in starts
Providers often blur three very different offerings under the phrase "work with your files":
- file parsing or OCR
- embeddings plus retrieval
- provider-managed file search or knowledge-base state
Those have very different portability properties.
If you pay for OCR and receive plain text, markdown, tables, or JSON, you can usually send that output to any other model later. That is portable value.
If you pay for embeddings and keep the vectors in your own vector database, that is partly portable. The data can move, but in practice many teams re-embed when they switch retrieval models so query and document vectors stay in the same space.
If you pay for a provider-managed file search layer, collections product, or knowledge-base service, the useful thing you are buying is no longer just the file. You are buying provider-owned retrieval state. Your PDF may be portable, but the managed index, cache, search path, and tool wiring are not.
That distinction matters more than most "token pricing" articles admit.
OpenAI
OpenAI is no longer just a model vendor. It is a managed runtime that layers retrieval, search, image generation, video generation, containers, speech, embeddings, and file infrastructure around the core inference API. That convenience is part of the appeal, but it also means the real cost of an OpenAI build can drift well beyond the headline model rate.
The biggest OpenAI pricing gotcha in 2026 is still long context. The current public sheet exposes separate short-context and long-context rates for the GPT-5.5 and GPT-5.4 families. That is a real pricing tier, not a minor footnote.
OpenAI's public pricing still includes older and specialized rows. The tables below focus on the current GPT-5.5/GPT-5.4 stack plus the image, speech, and tool rows that matter most for real bill estimation.
OpenAI text models and purchase modes
| SKU | Mode | Input | Cached input | Output | Notes |
|---|---|---|---|---|---|
| GPT-5.5 | Standard short context | $5.00 | $0.50 | $30.00 | Current flagship row. |
| GPT-5.5 | Standard long context | $10.00 | $1.00 | $45.00 | Long-context tier. |
| GPT-5.5 | Batch/Flex short context | $2.50 | $0.25 | $15.00 | Half of standard short-context pricing. |
| GPT-5.5 | Batch/Flex long context | $5.00 | $0.50 | $22.50 | Half of standard long-context pricing. |
| GPT-5.5 | Priority | $12.50 | $1.25 | $75.00 | Premium latency mode. |
| GPT-5.5 Pro | Standard short context | $30.00 | n/a | $180.00 | Highest-end OpenAI text tier. |
| GPT-5.5 Pro | Standard long context | $60.00 | n/a | $270.00 | Long-context tier. |
| GPT-5.5 Pro | Batch/Flex short context | $15.00 | n/a | $90.00 | Current table does not expose a separate long-context Batch/Flex row. |
| GPT-5.4 | Standard short context | $2.50 | $0.25 | $15.00 | Previous flagship row, still priced. |
| GPT-5.4 | Standard long context | $5.00 | $0.50 | $22.50 | Long-context tier. |
| GPT-5.4 Pro | Standard short context | $30.00 | n/a | $180.00 | Previous Pro row, still priced. |
| GPT-5.4 Pro | Standard long context | $60.00 | n/a | $270.00 | Long-context tier. |
| GPT-5.4 Pro | Batch/Flex short context | $15.00 | n/a | $90.00 | Half of standard short-context pricing. |
| GPT-5.4 Pro | Batch/Flex long context | $30.00 | n/a | $135.00 | Half of standard long-context input; output shown as $135. |
| GPT-5.4 mini | Standard | $0.75 | $0.075 | $4.50 | Batch/Flex is $0.375 / $0.0375 / $2.25. |
| GPT-5.4 nano | Standard | $0.20 | $0.02 | $1.25 | Cheapest current OpenAI general text row in the public table. Batch/Flex is $0.10 / $0.01 / $0.625. |
OpenAI realtime, audio, and speech
| SKU | Text input | Text cached | Text output | Audio input | Audio cached | Audio output | Image input | Image cached | Notes |
|---|---|---|---|---|---|---|---|---|---|
| gpt-realtime-2 | $4.00 | $0.40 | $24.00 | $32.00 | $0.40 | $64.00 | $5.00 | $0.50 | Prices are per 1M tokens. |
| gpt-realtime-translate | n/a | n/a | n/a | n/a | n/a | $0.034 per minute | n/a | n/a | Minute-priced live translation. |
| gpt-realtime-whisper | n/a | n/a | n/a | n/a | n/a | $0.017 per minute | n/a | n/a | Minute-priced streaming transcription. |
| SKU | Price | Notes |
|---|---|---|
| gpt-4o-transcribe / diarize | $2.50 input / $10 output per 1M tokens; about $0.006 per minute | Speech to text. |
| gpt-4o-mini-transcribe | $1.25 input / $5 output per 1M tokens; about $0.003 per minute | Cheaper speech to text. |
OpenAI image, video, and embeddings
| SKU | Price | Notes |
|---|---|---|
| GPT Image 2 | Text $5 / $1.25 / n/a; image $8 / $2 / $30 per 1M tokens | Batch halves image token output to $15. |
| GPT Image 1.5 | Text $5 / $1.25 / $10; image $8 / $2 / $32 | Batch halves image token output to $16. |
| GPT Image 1 mini | Text $2 / $0.20 / n/a; image $2.50 / $0.25 / $8 | Batch halves image token output to $4. |
| Sora 2 | $0.10 per second | Video generation. |
| Sora 2 Pro | 720p $0.30/sec; 1024x1792 $0.50/sec; 1080p $0.70/sec | Premium video generation. |
| text-embedding models | Current public pricing page reviewed did not expose a clean embedding table | Verify model-specific embedding pricing in the docs or account console before carrying older prices into an estimate. |
If you are comparing dedicated image and video generation APIs rather than broader model stacks, Direct Provider Pricing for Image + Video Generation APIs (USD) breaks out that market separately.
OpenAI tool and platform charges
| Service | Price | Portable output? |
|---|---|---|
| Regional/data-residency endpoints for eligible models released on or after 2026-03-05 | +10% uplift | Regional deployment choice, not a portable artifact. |
| Containers | Per 20-minute session: 1GB $0.03; 4GB $0.12; 16GB $0.48; 64GB $1.92 | Generated files can be exported; the runtime state is OpenAI-only. |
| File Search storage | $0.10 per GB per day, first GB free | Your source files and extracted outputs can travel; the hosted vector store cannot. |
| File Search tool call | $2.50 per 1,000 calls | Calls use OpenAI-managed retrieval state. |
| Agent Kit file and image upload storage | $0.10 per GB-day after 1 GB free per account per month | Storage can be exported only if you also keep your own copy of the files. |
| Web Search tool | Standard/reasoning preview $10 per 1,000 calls plus search content tokens; non-reasoning preview $25 per 1,000 calls with free search content tokens | Returned URLs/snippets are portable; the tool and session state are OpenAI-only. |
| Moderation | Free | Yes, as a classification result. |
OpenAI portability read
OpenAI is easy to adopt and easy to deepen. The tradeoff is that switching pain grows fastest when File Search, Web Search, containers, and other hosted runtime layers become central to the product.
Anthropic
Anthropic currently has the cleanest cache math in the market. It tells you exactly what cache writes cost, what cache reads cost, and how batch pricing halves the bill. It also has one of the cleanest long-context stories among premium vendors, because Fable 5, Mythos 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6 keep 1M context at standard rates.
Anthropic models
| SKU | Input | 5m cache write | 1h cache write | Cache read | Output | Batch input | Batch output | Notes |
|---|---|---|---|---|---|---|---|---|
| Fable 5 | $10.00 | $12.50 | $20.00 | $1.00 | $50.00 | $5.00 | $25.00 | 1M context stays at standard rates. |
| Mythos 5 | $10.00 | $12.50 | $20.00 | $1.00 | $50.00 | $5.00 | $25.00 | Limited availability. |
| Opus 4.8 | $5.00 | $6.25 | $10.00 | $0.50 | $25.00 | $2.50 | $12.50 | 1M context stays at standard rates; fast mode is $10 in / $50 out. |
| Opus 4.7 | $5.00 | $6.25 | $10.00 | $0.50 | $25.00 | $2.50 | $12.50 | 1M context stays at standard rates; fast mode is $30 in / $150 out. |
| Opus 4.6 | $5.00 | $6.25 | $10.00 | $0.50 | $25.00 | $2.50 | $12.50 | 1M context stays at standard rates; fast mode is $30 in / $150 out. |
| Opus 4.5 | $5.00 | $6.25 | $10.00 | $0.50 | $25.00 | $2.50 | $12.50 | Same base price as Opus 4.6 and 4.7. |
| Opus 4.1 | $15.00 | $18.75 | $30.00 | $1.50 | $75.00 | $7.50 | $37.50 | Deprecated. |
| Opus 4 | $15.00 | $18.75 | $30.00 | $1.50 | $75.00 | $7.50 | $37.50 | Retired except on Vertex AI. |
| Sonnet 4.6 | $3.00 | $3.75 | $6.00 | $0.30 | $15.00 | $1.50 | $7.50 | 1M context stays at standard rates. |
| Sonnet 4.5 | $3.00 | $3.75 | $6.00 | $0.30 | $15.00 | $1.50 | $7.50 | Still listed. |
| Sonnet 4 | $3.00 | $3.75 | $6.00 | $0.30 | $15.00 | $1.50 | $7.50 | Retired except on Bedrock and Vertex AI. |
| Haiku 4.5 | $1.00 | $1.25 | $2.00 | $0.10 | $5.00 | $0.50 | $2.50 | Cheapest Claude 4.x tier. |
| Haiku 3.5 | $0.80 | $1.00 | $1.60 | $0.08 | $4.00 | $0.40 | $2.00 | Retired except on Bedrock and Vertex AI. |
Anthropic tools and runtime costs
| Service | Price | Portability |
|---|---|---|
| Prompt caching | 5-minute writes are 1.25x base input; 1-hour writes are 2x; reads are 0.1x | Provider-only cache state. |
| US-only inference on Claude API | 1.1x multiplier on all token categories for Opus 4.6 and newer | Regional deployment choice, not a portable artifact. |
| Regional endpoints on Bedrock/Vertex | +10% on current 4.5-family regional endpoints | Cloud-channel choice, not a portable artifact. |
| Web Search | $10 per 1,000 searches plus normal token charges | Returned sources travel; the tool/session state does not. |
| Web Fetch | No additional line-item charge beyond normal token usage | Fetched text/output is portable. |
| Code execution | Free when used with web search or web fetch; otherwise first 1,550 hours per org per month free, then $0.05 per hour per container, 5-minute minimum | Generated files travel; container state does not. |
| Claude Managed Agents session runtime | $0.08 per session-hour | Runtime state is Anthropic-managed; outputs can be stored elsewhere. |
| Bash tool overhead | Adds 245 input tokens for the tool definition | The overhead is Claude-specific. |
| Text editor overhead | Adds 700 input tokens for the tool definition | The overhead is Claude-specific. |
| Computer use overhead | Adds roughly 466-499 tokens to the system prompt, 735 input tokens per tool definition, plus screenshot vision tokens | The session/runtime state is provider-only. |
Anthropic portability read
Anthropic stays relatively portable if you keep retrieval outside Claude and use the API mainly for reasoning and orchestration. The sticky pieces are cache, container state, and computer-use runtime.
Google Gemini API
Google is the clearest example of low model list prices living inside a large ecosystem. Gemini 2.5 Flash-Lite is still cheap at the model layer, while Gemini 3.1 Pro and Gemini 3.5 Flash are now the rows to watch higher up the stack. Google separately monetizes cache storage, grounding, embeddings, multimodal outputs, and file-search economics. Another important detail: Gemini output pricing explicitly includes thinking tokens, which makes direct output-token comparisons less clean.
Google core text and reasoning models
| SKU | Standard input | Standard output | Cache token price | Cache storage | Batch | Notes |
|---|---|---|---|---|---|---|
| Gemini 3.5 Flash | Text/image/video $1.50 | $9.00 | $0.15 text/img/video | $1.50/hr | Batch/Flex $0.75 in / $4.50 out | Priority is $2.70 in / $16.20 out. Output pricing includes thinking tokens. |
| Gemini 3.1 Pro Preview | <=200K $2.00; >200K $4.00 | <=200K $12.00; >200K $18.00 | $0.20 or $0.40 | $4.50 per 1M tokens/hour | Batch/Flex input/output halved | Search and Maps are $14 per 1,000 queries after free tier. Output pricing includes thinking tokens. |
| Gemini 3 Flash Preview | Text/image/video $0.50; audio $1.00 | $3.00 | $0.05 text/img/video; $0.10 audio | $1.00/hr | Batch/Flex input $0.25 or $0.50 audio; output $1.50 | Output pricing includes thinking tokens. |
| Gemini 3.1 Flash-Lite | Text/image/video $0.25; audio $0.50 | $1.50 | $0.025 text/img/video; $0.05 audio | $0.50/hr | Batch/Flex input $0.125 or $0.25 audio; output $0.75 | Priority is $0.45 input and $2.70 output. |
| Gemini 2.5 Pro | <=200K $1.25; >200K $2.50 | <=200K $10.00; >200K $15.00 | $0.125 or $0.25 | $4.50/hr | Input/output halved | Search grounding $35 per 1,000 grounded prompts; Maps $25 per 1,000. Output pricing includes thinking tokens. |
| Gemini 2.5 Flash | Text/image/video $0.30; audio $1.00 | $2.50 | $0.03 text/img/video; $0.10 audio | $1.00/hr | Input $0.15 or $0.50 audio; output $1.25 | Output pricing includes thinking tokens. |
| Gemini 2.5 Flash-Lite | Text/image/video $0.10; audio $0.30 | $0.40 | $0.01 text/img/video; $0.03 audio | $1.00/hr | Input $0.05 or $0.15 audio; output $0.20 | Output pricing includes thinking tokens. |
| Gemini 2.5 Flash-Lite Preview | Same as 2.5 Flash-Lite | Same as 2.5 Flash-Lite | Same as 2.5 Flash-Lite | $1.00/hr | Same as 2.5 Flash-Lite | Preview row still listed separately. |
| Gemini Robotics-ER 1.6 Preview | Text/image/video $1.00; audio $2.00 | $5.00 | n/a | n/a | Batch $0.50 or $1 audio in / $2.50 out | Search grounding is $14 per 1,000 search queries after free tier. |
| Gemini 2.5 Computer Use Preview | <=200K $1.25; >200K $2.50 | <=200K $10.00; >200K $15.00 | n/a | n/a | n/a | Specialized computer-use row. |
| Gemini 2.0 Flash | Text/image/video $0.10; audio $0.70 | $0.40 | $0.025 text/img/video; $0.175 audio | $1.00/hr | Input $0.05 or $0.35 audio; output $0.20 | Deprecated. Shutdown June 1, 2026. |
| Gemini 2.0 Flash-Lite | $0.075 | $0.30 | n/a | n/a | $0.0375 in / $0.15 out | Deprecated. Shutdown June 1, 2026. |
| Gemma 4 | Free-tier-only rows in the captured pricing page | Free-tier-only rows in the captured pricing page | n/a | n/a | n/a | Important open model family, but not shown as a normal paid Developer API row in the pricing page reviewed. |
Google image, audio, TTS, and video
| SKU | Price | Notes |
|---|---|---|
| Gemini 3.1 Flash Image | Text/image input $0.50; text+thinking output $3 | Batch row is $0.25 in / $1.50 out. |
| Gemini 3 Pro Image | Text/image input $2.00; text+thinking output $12 | Batch/Flex row is $1 in / $6 out; Priority is $3.60 in / $21.60 out. |
| Gemini 3.5 Live Translate | Audio input $3.50; audio output $21.00 | Google also shows approximate minute equivalents of $0.0053/min input and $0.0315/min output. |
| Gemini 2.5 Flash Native Audio | Text input $0.50; audio/video input $3.00; text output $2.00; audio output $12.00 | Live API native audio row. |
| Gemini 2.5 Flash Image | Text/image input $0.30; image output $0.039 per image | Batch input $0.15; image output $0.0195 per image. |
| Gemini 2.5 Flash Preview TTS | Text input $0.50; audio output $10.00 | Batch $0.25 in / $5 out. |
| Gemini 2.5 Pro Preview TTS | Text input $1.00; audio output $20.00 | Batch $0.50 in / $10 out. |
| Imagen 4 Fast | $0.02 per image | Image generation. |
| Imagen 4 Standard | $0.04 per image | Image generation. |
| Imagen 4 Ultra | $0.06 per image | Image generation. |
| Veo 3.1 Standard | 720p/1080p $0.40 per second; 4K $0.60 per second | Video generation. |
| Veo 3.1 Fast | 720p $0.10/sec; 1080p $0.12/sec; 4K $0.30/sec | Cheaper video tier. |
| Veo 3.1 Lite | 720p $0.05/sec; 1080p $0.08/sec | Lowest current Veo 3.1 public row. |
| Veo 3 Standard | $0.40 per second | Older video row still listed. |
| Veo 3 Fast | 720p $0.10/sec; 1080p $0.12/sec; 4K $0.30/sec | Older fast video row still listed. |
| Veo 2 | $0.35 per second | Older video row still listed. |
Google embeddings and tools
| Service | Price | Portability |
|---|---|---|
| Gemini Embedding 2 | Text $0.20 per 1M tokens; image $0.45; audio $6.50; video $12.00 | Batch halves these prices. Embeddings can be exported, but in practice you usually re-embed when you switch retrieval models. |
| Gemini Embedding 001 | $0.15 per 1M text tokens; batch $0.075 | Same caveat: vectors travel, but retrieval quality usually wants matching query/document models. |
| Google Search grounding | Gemini 2.5: $35 per 1,000 grounded prompts after free tier; Gemini 3: $14 per 1,000 search queries after free tier | Returned URLs/snippets travel; grounding service and session state do not. |
| Google Maps grounding | $25 per 1,000 grounded prompts after free tier | Returned data is reusable; the service itself is Google-only. |
| Code execution | No separate runtime charge; billed only as normal model tokens | Generated files/results can travel; runtime state does not. |
| URL context | No separate tool fee; billed as normal input tokens | Fetched text/output is portable. |
| File Search | Tool line itself is free, but embeddings are billed and retrieved document tokens are billed as model tokens | Your files travel; the hosted retrieval layer does not. |
Google portability read
Gemini itself is price-competitive. The switching cost rises when Search, Maps, cache storage, and File Search become part of the core workflow instead of just optional tools.
Amazon Nova and Bedrock-side economics
Amazon Nova can look extremely cheap on the pure model rows, especially Nova Micro and Nova Lite. In practice, though, you are buying Bedrock economics: model inference plus whatever you choose from Prompt Optimization, Guardrails, Knowledge Bases, Data Automation, browser automation, and other AWS-shaped platform features. If your data, IAM, networking, and operations already live in AWS, that integration value may dominate list-price differences.
Amazon Nova models
| SKU | Public list price | Confidence | Notes |
|---|---|---|---|
| Nova Micro | $0.035 in / $0.14 out | High | Clearly exposed in official public pricing snippets. |
| Nova Lite | $0.06 in / $0.24 out | High | Clearly exposed in official public pricing snippets. |
| Nova Pro | $0.80 in / $3.20 out | High | Clearly exposed in official public pricing snippets. |
| Nova Pro latency optimized | $1.00 in / $4.00 out | High | Clearly exposed in official public pricing snippets. |
| Nova 2 Lite | Pricing page snippet showed $0.30 input / $2.50 output | Medium | The public static page did not expose the labeled row cleanly enough for more detail. |
| Nova 2 Pro Preview | Pricing page snippet showed a row starting with repeated $2.1875 values and $17.50 output | Low | I am not reproducing the row as fully verified because the table labels were not visible in the static capture. |
| Nova 2 Omni Preview | Not cleanly exposed in the public static docs view | Low | Product exists on the model catalog page. |
| Nova 2 Sonic | Not cleanly exposed in the public static docs view | Low | Speech model family. |
| Nova Premier | Not cleanly exposed in the public static docs view | Low | Most capable multimodal understanding model. |
| Nova Canvas | Not cleanly exposed in the public static docs view | Low | Image generation model. |
| Nova Reel | Not cleanly exposed in the public static docs view | Low | Video generation model. |
| Nova Multimodal Embeddings | Not cleanly exposed in the public static docs view | Low | Embedding model. |
AWS platform-side charges around Nova
| Service | Price | Portability |
|---|---|---|
| Nova Act | $4.75 per agent-hour | Generated browser outputs can be exported; the workflow runtime is AWS-specific. |
| Nova Forge | Annual subscription; public price not disclosed | Model-building workflow is provider-specific. |
| Bedrock Prompt Optimization | $0.03 per 1,000 tokens | Prompt suggestions can travel; the service itself is AWS-specific. |
| Knowledge Bases / Guardrails / Data Automation | Separate Bedrock charges exist, but the static public excerpts I captured did not expose every current numeric row cleanly | These services are AWS-shaped platform features, not portable assets. |
Amazon Nova portability read
Nova can be cheap at the model layer, but the real lock-in lives in the Bedrock wrapper: Knowledge Bases, Guardrails, routing, optimization, and agent/browser runtime state.
xAI
xAI is more cost-competitive than many people assume, especially if you separate model cost from ecosystem branding. The current public model page has a simpler shape than older Grok 4.20-era pricing: Grok 4.3 is $1.25 in / $2.50 out, Grok Build 0.1 is $1 in / $2 out, and the pricing payload also maps current Grok 4.20 rows to $1.25 in / $0.20 cached / $2.50 out. xAI's tool charges are unusually clear. The key distinction is between portable one-shot outputs and xAI-managed retrieval state such as Collections.
xAI models and modes
| SKU | Input | Cached input | Output | Notes |
|---|---|---|---|---|
| Grok 4.3 | $1.25 | $0.20 | $2.50 | Current main chat row. The public model page exposes input and output; the embedded pricing payload exposes cached input. |
| Grok 4.20 0309 reasoning / non-reasoning | $1.25 | $0.20 | $2.50 | Current pricing payload rows list both modes at the same rate. |
| Grok 4.20 Multi Agent 0309 | $1.25 | $0.20 | $2.50 | Separate multi-agent row with the same token rates. |
| Grok Build 0.1 / Grok Code Fast aliases | $1.00 | $0.20 | $2.00 | Current coding row. |
| Realtime API | n/a | n/a | $0.05 per minute | Also shown as $3 per hour. |
| Text to speech | n/a | n/a | $15 per 1M characters | Speech generation. |
| Speech to text | n/a | n/a | $0.10/hour REST; $0.20/hour streaming | Speech-to-text uses hourly units, not tokens. |
| Grok Imagine Image | $0.002 per input image; quality mode $0.01 per input image | n/a | Output starts at $0.02 per image; quality mode $0.05-$0.07 per image | Image generation units are per image, not per token. |
| Grok Imagine Video | Input $0.01/sec or $0.002/image | n/a | Output starts at $0.05-$0.07/sec; Video 1.5 is $0.08-$0.14/sec | Video generation units are per second or per image, depending on mode. |
xAI tools
| Service | Price | Portability |
|---|---|---|
| Batch API | 50% discount across input, output, cached input, and reasoning tokens for supported text/language models | Batch queue is provider-only; outputs are portable. |
| Web Search | $5 per 1,000 calls | Returned links/snippets are portable; search tool state is xAI-only. |
| X Search | $5 per 1,000 calls | Returned content is portable; the service is xAI/X-specific. |
| Code Execution | $5 per 1,000 calls | Generated files/results travel; runtime state does not. |
| File Attachments search | $10 per 1,000 calls | Your files travel; attachment search state is provider-specific. |
| Collections Search | $2.50 per 1,000 calls | Collections are xAI-managed knowledge bases and are not cross-provider assets. |
| view_image / video tools | Token-based | Results travel; the tool path does not. |
xAI portability read
xAI is attractive for cheap search-grounded or tool-heavy experimentation. It becomes much less portable once Collections and native search workflows move from peripheral features to core product state.
Mistral
Mistral has the strongest portability story in this comparison because it repeatedly sells reusable artifacts: OCR output, embeddings, and open-weight models. You can prototype on hosted inference, then self-host or move across clouds later without redesigning the whole stack around a proprietary retrieval product.
Mistral hosted SKUs and utilities
| SKU | Input | Output | Notes |
|---|---|---|---|
| Mistral Large 3 | $0.50 | $1.50 | Flagship open-weight model. |
| Mistral Medium 3.5 | $1.50 | $7.50 | Current Medium row on Mistral's pricing page. |
| Mistral Small 4 | $0.10 | $0.30 | Cheap general text model. |
| Ministral 3 | 3B $0.10; 8B $0.15; 14B $0.20 | 3B $0.10; 8B $0.15; 14B $0.20 | Lightweight family; prices are input/output per 1M tokens. |
| Devstral Small 2 | $0.10 | $0.30 | Cheap code-oriented model. |
| Devstral 2 | $0.40 | $2.00 | Larger open-weight coding/agentic model. |
| Codestral | $0.30 | $0.90 | Code model. |
| Voxtral Small | $0.004 per audio minute; $0.10 text input | $0.40 | Audio-input model with both minute and token pricing exposed. |
| Voxtral TTS | $0.016 per 1,000 characters | n/a | Text-to-speech generation. |
| Mistral Embed | $0.10 per 1M tokens | n/a | Text embeddings. |
| Codestral Embed | $0.15 per 1M tokens | n/a | Code embeddings. |
| OCR 3 | $2 per 1,000 pages | n/a | Portable OCR / document extraction. |
| OCR 3 Annotated | $3 per 1,000 pages | n/a | Annotated OCR output. |
Mistral portability read
Mistral is the cleanest escape-hatch story in this market. If low medium-term switching cost matters most, it is one of the strongest answers.
Cohere
Cohere remains interesting for enterprise text and retrieval-heavy work. Command A is not the cheapest flagship, but Command R and Command R7B are aggressively priced, and Cohere's broader product story still centers on retrieval, control, and private deployment rather than consumer-platform sprawl. The main caveat here is pricing visibility for some non-generation SKUs, so those rows stay conservative.
Cohere current public rows
| SKU | Input | Output | Notes |
|---|---|---|---|
| Command A | $2.50 | $10.00 | Current flagship enterprise text model. |
| Command A Reasoning | Not cleanly exposed in the public static docs view | Not cleanly exposed in the public static docs view | Public model exists; I did not verify a stable paid list price row in the static view. |
| Command A Vision | Not cleanly exposed in the public static docs view | Not cleanly exposed in the public static docs view | Public model exists; pricing row was not exposed in the snippets I could verify. |
| Command R | $0.15 | $0.60 | Competitive general text tier. |
| Command R7B | $0.0375 | $0.15 | Very low cost named enterprise model. |
| Command R+ 08-2024 | $2.50 | $10.00 | Older refresh row documented in Cohere's changelog. Treat as legacy unless your account still exposes it. |
| Rerank models | Query-priced | n/a | Cohere's public docs explain the pricing model, but the static public view I reviewed did not surface a stable current dollar figure on Cohere's own site. |
| Embed models | Token-priced | n/a | Same caveat: pricing model is public, but the static docs view I reviewed did not surface a stable current dollar figure on Cohere's own site. |
Cohere portability read
Cohere is relatively portable if you keep your own vector database or use private deployment. Its stickier value is in managed enterprise deployment and private hosting, not a giant public tooling layer.
DeepSeek
DeepSeek has one of the simplest serious price sheets in the market. It currently exposes a very low cache-hit price, a still-low cache-miss input price, and a low output price across both the non-thinking and thinking aliases. The pricing page is refreshingly direct.
The more strategically important point is API compatibility. DeepSeek explicitly supports OpenAI-compatible and Anthropic-compatible formats, which can lower migration cost even beyond the headline token savings.
DeepSeek current public rows
| SKU | Cache-miss input | Cache-hit input | Output | Notes |
|---|---|---|---|---|
| deepseek-v4-flash / deepseek-chat / deepseek-reasoner | $0.14 | $0.0028 | $0.28 | Compatibility aliases map to DeepSeek-V4-Flash non-thinking and thinking modes. |
| deepseek-v4-pro | $0.435 | $0.003625 | $0.87 | Current public V4-Pro row. |
DeepSeek portability read
The cache is provider-specific, but the overall stack is unusually portable because the API surface mirrors other major ecosystems.
Qwen via Alibaba Cloud Model Studio
Qwen can look astonishingly cheap, but it is easy to read the pricing page incorrectly. Deployment mode, region, context tier, and thinking mode all affect the bill. There is no single "Qwen price"; there are several Qwen price surfaces, and the cheapest headline row usually belongs to a specific deployment mode and context bracket. That complexity does not make Qwen unattractive, but it does mean you have to read the table carefully.
Qwen deployment-mode overview
| Deployment mode | Qwen3-Max | Qwen3.5-Plus | Qwen3.5-Flash | Notes |
|---|---|---|---|---|
| International | Starts at $1.20 / $6.00 | Starts at $0.40 / $2.40 | Starts at $0.10 / $0.40 | Higher-priced global commercial tier. |
| Global | Starts at $0.359 / $1.434 | Starts at $0.115 / $0.688 | Starts at $0.029 / $0.287 | Cheapest captured mode for current flagship rows. |
| US | n/a in captured rows | Qwen-Plus starts at $0.40 / $1.20 | Qwen-Flash starts at $0.05 / $0.40 | US uses qwen-plus and qwen-flash naming in the captured rows. |
| Chinese Mainland | Starts at $0.359 / $1.434 | Starts at $0.115 / $0.688 | Starts at $0.029 / $0.287 | Current mainland minima match the captured rows. |
Qwen detailed tiered rows captured from the public pricing page
| SKU | Tier | Input | Output | Notes |
|---|---|---|---|---|
| qwen3-max / qwen3-max-2026-01-23 | 0-32K | $0.359 | $1.434 | Global row. |
| qwen3-max / qwen3-max-2026-01-23 | 32K-128K | $0.574 | $2.294 | Global row. |
| qwen3-max / qwen3-max-2026-01-23 | 128K-252K | $1.004 | $4.014 | Global row. |
| qwen3-max-2025-09-23 / qwen3-max-preview | 0-32K | $0.861 | $3.441 | Older, more expensive snapshot/previews. |
| qwen3-max-2025-09-23 / qwen3-max-preview | 32K-128K | $1.434 | $5.735 | Older, more expensive snapshot/previews. |
| qwen3-max-2025-09-23 / qwen3-max-preview | 128K-252K | $2.151 | $8.602 | Older, more expensive snapshot/previews. |
| qwen3.5-flash | 0-128K | $0.029 | $0.287 | Global row. Supports batch and context cache discounts. |
| qwen3.5-flash | 128K-256K | $0.115 | $1.147 | Global row. |
| qwen3.5-flash | 256K-1M | $0.172 | $1.720 | Global row. |
| qwen-flash-us | 0-256K | $0.05 | $0.40 | US row. Supports batch at half price. |
| qwen-flash-us | 256K-1M | $0.25 | $2.00 | US row. Supports batch at half price. |
| qwen3.5-plus | 0-256K | $0.40 | $2.40 | International row. |
| qwen3.5-plus | 256K-1M | $0.50 | $3.00 | International row. |
| qwen-plus non-thinking | 0-256K | $0.40 | $1.20 | International row. |
| qwen-plus non-thinking | 256K-1M | $1.20 | $3.60 | International row. |
| qwen-plus thinking | Tiered | Much higher than non-thinking mode | The public static capture did not expose the full row cleanly enough to reproduce every number with confidence | Thinking mode pricing exists and is materially higher. |
Commercial multimodal Qwen families listed without a clean numeric row in the captured static view
| Family | Status | Price visibility | Portability |
|---|---|---|---|
| Qwen-VL | Commercial multimodal family is publicly listed | The static excerpts I reviewed did not expose a clean numeric row | Model outputs travel; Model Studio deployment economics do not. |
| QVQ | Commercial multimodal family is publicly listed | The static excerpts I reviewed did not expose a clean numeric row | Same caveat. |
| Qwen-Omni | Commercial multimodal family is publicly listed | The static excerpts I reviewed did not expose a clean numeric row | Same caveat. |
| Qwen-Omni-Realtime | Commercial multimodal family is publicly listed | The static excerpts I reviewed did not expose a clean numeric row | Same caveat. |
Qwen portability read
Qwen is attractive if you want a very cheap hosted commercial tier now without giving up the broader strategic value of an open-source/open-weight family later.
A note on Meta and the Llama API
Meta's Llama ecosystem is strategically important and absolutely belongs in any serious conversation about portability, self-hosting, and open-weight alternatives. But I am not putting a numeric Llama API row into the main price comparison because, in the public docs view I could verify today, I could not confirm a stable public pay-as-you-go token sheet accessible without preview gating or login. That is an honesty choice, not an omission by accident.
What can you actually reuse across providers?
This is the part many teams get wrong: in practice, they are buying a mix of portable artifacts and provider-only state.
Provider-by-provider portability map
| Provider | Outputs that travel well | Provider-bound services |
|---|---|---|
| OpenAI | Text, JSON, code, transcripts, generated files, original uploaded files | File Search vector stores, Web Search session state, Containers, response/thread/runtime state |
| Anthropic | Text, JSON, tool results, fetched web content, generated files | Prompt cache, code execution container state, computer-use runtime state |
| Text, grounded links/snippets, generated files, original uploaded files | Search grounding session state, Maps grounding, cache storage, File Search retrieval state | |
| Amazon Nova / Bedrock | Text, generated assets, source files | Knowledge Bases, Guardrails, routing, prompt optimization workflow, browser/agent runtimes |
| xAI | Text, JSON, code, generated files, source attachments | Collections, attachment search state, tool/runtime state |
| Mistral | Text, OCR output, embeddings, open-weight checkpoints, source files | Very little compared with the rest; the main lock-in comes only if you let Mistral own retrieval instead of you |
| Cohere | Text, rerank scores, embeddings, private deployment artifacts | Managed public API usage is portable enough; Model Vault and private deployment reduce lock-in further |
| DeepSeek | Text, JSON, source files, cached prompt-independent outputs | Prompt cache is provider-specific, but the API surface is unusually portable because it mirrors other ecosystems |
| Qwen / Alibaba Cloud | Text, JSON, open-source/open-weight family artifacts, source files | Model Studio deployment mode economics, cache, and batch plumbing are Alibaba-specific |
What moves cleanly, what does not
| Artifact or service | Cross-provider? | Why | Examples |
|---|---|---|---|
| Raw outputs: text, JSON, code | Yes | They are just artifacts once you store them outside the vendor. | Any chat/completion output from any provider. |
| Source files: PDFs, CSVs, images, audio | Yes | Keep the originals in your own storage. | The same PDF can be sent to OpenAI, Anthropic, Google, Mistral, or xAI. |
| OCR and transcripts | Yes | Plain text, markdown, tables, and JSON are portable. | Mistral OCR output; OpenAI or Google transcripts; Whisper transcripts. |
| Embeddings and vectors | Partly | You can export vectors, but in practice you usually re-embed when you switch retrieval models so query and document embeddings stay aligned. | Gemini Embedding, Mistral Embed, Cohere Embed, OpenAI embedding models. |
| Prompt cache | No | Cache state is defined by the provider's own runtime and token accounting. | OpenAI cached input, Anthropic cache write/read, Google cache storage, DeepSeek cache-hit pricing. |
| Hosted file search / managed vector stores | No | You can move the source files, not the hosted index as a portable service. | OpenAI File Search, Google File Search, xAI Collections, Bedrock Knowledge Bases. |
| Grounded web or map search results | Partly | Returned links/snippets are reusable, but the tool path and session state are provider-specific. | OpenAI Web Search, Anthropic Web Search, Google Search/Maps grounding, xAI Web Search/X Search. |
| Code execution and containers | No, except for exported outputs | The runtime state lives inside the provider. | OpenAI Containers, Anthropic code execution, Google code execution, xAI code execution, Nova Act workflows. |
| Open-weight models and self-hosted checkpoints | Yes | You can move them across clouds or self-host them. | Mistral open-weight models, the wider Qwen family, Meta Llama families. |
| API client compatibility | Partly | If a provider mirrors another provider's API shape, migration is easier even if the model itself is different. | DeepSeek supports OpenAI-compatible and Anthropic-compatible formats. |
That table gives you a simple rule of thumb:
- source artifacts usually travel
- final outputs usually travel
- provider-managed retrieval usually does not
- prompt caches do not
- runtimes and containers do not
- embeddings are only partly portable in practice
- open-weight checkpoints are the strongest long-term escape hatch
A simple example: search-grounded pricing can invert the ranking
If your workflow calls search on every turn, the search line itself can dominate the bill surprisingly quickly.
| Provider | Search tool line only | What that means |
|---|---|---|
| xAI | $5 per 1,000 calls | Cheapest clearly exposed first-party search tool fee in this review. |
| OpenAI | $10 per 1,000 calls for standard/reasoning preview modes, plus search content tokens | The tool line is moderate, but tokens still apply. |
| Anthropic | $10 per 1,000 searches, plus normal tokens | Very similar headline tool fee to OpenAI. |
| Google Gemini 3 | $14 per 1,000 search queries after free tier | Cheaper than Gemini 2.5 grounding, pricier than xAI and OpenAI/Anthropic. |
| Google Gemini 2.5 | $35 per 1,000 grounded prompts after free tier | Grounding is meaningfully more expensive here than on Gemini 3. |
That is before you add the model tokens. A provider that looks cheap on generation alone can become expensive in a search-heavy product, and a provider that looks expensive on generation alone can become reasonable if its tool line is cheaper or more predictable for your workload.
The practical buying guide
Here is the cleanest way to think about the market right now.
If you want the cheapest text possible
Start with the low-cost rows that are genuinely public and current: Qwen3.5-Flash Global, Cohere Command R7B, Amazon Nova Micro, Mistral Small 4, Devstral Small 2, Gemini 2.5 Flash-Lite, and DeepSeek. Then test the output quality you actually need. The interesting part is no longer finding a cheap model. It is finding a cheap model that does not force you into expensive surrounding services later.
If you want premium reasoning without instantly paying premium-platform tax
Gemini 3.1 Pro Preview, Grok 4.3, and Qwen3-Max all look more price-aggressive on list price than the top Anthropic and OpenAI premium rows. That does not tell you which one is best for your tasks. It tells you where to begin if you want to avoid assuming the most visible premium models are also the most financially sensible.
If you want the easiest long-context economics
Anthropic is the cleanest answer in the public sheets I reviewed because Fable 5, Mythos 5, Opus 4.8, Opus 4.7, Opus 4.6, and Sonnet 4.6 keep 1M context at standard rates. OpenAI and Google can still be excellent choices, but their price jumps are explicit enough that you should model them before you architect around giant prompts.
If you want the lowest future switching pain
Mistral plus your own storage and retrieval stack is probably the cleanest answer. Anthropic with self-managed retrieval is also good. DeepSeek is especially interesting if you already have OpenAI-flavored or Anthropic-flavored client code and want lower migration friction. The deeper you go into File Search, Collections, Knowledge Bases, cache storage, containers, or provider-native search/grounding, the less meaningful model-layer portability becomes.
Common questions about AI token pricing
What actually determines the real cost of an LLM stack in 2026?
In this comparison, the real cost is a cost shape, not just a token rate. Input tokens, output tokens, cached tokens, cache writes, cache storage, batch discounts, long-context repricing, search-grounding fees, hosted retrieval fees, document processing charges, and runtime costs can all change the final bill materially.
Is headline token price enough to compare providers?
No. Headline token pricing still matters, but on its own it is incomplete. Search, retrieval, file processing, runtime, and provider-owned state can all change which stack is actually cheaper for a real workload.
Which providers have the lowest future switching pain?
The shortlist in this article is Mistral plus your own storage and retrieval stack, Anthropic with self-managed retrieval, and DeepSeek if API compatibility matters to you. In each case, lower switching pain comes from keeping retrieval and long-lived state under your control.
Are embeddings and managed file search portable across providers?
Not in the same way. Source files, OCR output, transcripts, and final outputs travel well. Embeddings are only partly portable in practice because many teams re-embed when they switch retrieval models. Managed file search, hosted vector stores, caches, and runtime state are provider-specific.
Bottom line
By 2026, "token price" on its own is an incomplete buying metric. What matters is the full cost shape: model price, cache behavior, long-context behavior, tool pricing, retrieval pricing, file-processing pricing, runtime pricing, and ultimately how much provider-owned state your application accumulates.
If you care most about the lowest raw list price, the market now gives you several clear places to start testing. If you care most about premium managed stacks, OpenAI, Google, AWS, and xAI offer the deepest native ecosystems. If you care most about reducing future switching pain, Mistral, Anthropic with BYO retrieval, and DeepSeek remain the most strategically interesting combinations.
The most honest way to compare providers in 2026 is not to ask which model is cheapest. It is to ask which full cost shape you are actually willing to buy.
Official sources checked
- OpenAI developer pricing docs: https://developers.openai.com/api/docs/pricing/
- Anthropic Claude pricing: https://platform.claude.com/docs/en/about-claude/pricing
- Google Gemini API pricing: https://ai.google.dev/gemini-api/docs/pricing
- Amazon Nova pricing: https://aws.amazon.com/nova/pricing/
- Amazon Nova models: https://aws.amazon.com/nova/models/
- Amazon Bedrock pricing: https://aws.amazon.com/bedrock/pricing/
- xAI models and pricing: https://docs.x.ai/developers/models
- xAI model pages and tool docs: https://docs.x.ai/
- Mistral pricing: https://mistral.ai/pricing/
- Mistral model docs: https://docs.mistral.ai/getting-started/models/compare
- Cohere docs: https://docs.cohere.com/docs/models
- Cohere pricing explainer: https://docs.cohere.com/docs/how-does-cohere-pricing-work
- DeepSeek pricing: https://api-docs.deepseek.com/quick_start/pricing
- Alibaba Cloud Model Studio model pricing: https://www.alibabacloud.com/help/en/model-studio/model-pricing
- Meta Llama API overview: https://llama.developer.meta.com/docs/overview/