AI Token Costs, Explained

What a token actually is

A token is a small unit an AI system uses to measure text, and in some systems token-like units are also used for images, audio, video, or other supported inputs. It is not the same as a word. A token can be a word fragment, a short word, punctuation, part of a number, or a unit created when the provider encodes non-text content.

That matters because a dashboard showing 117 million tokens does not mean 117 million full words. It means the model processed 117 million billable units across prompts, outputs, context, tool results, chat history, file contents sent to the model, or other supported content.

Reader shortcut: tokens measure volume. Models set the price. Dashboards show the meter. Your workflow determines how fast the meter runs.

The AI spend formula

The most useful mental model is simple, even though provider pricing can get detailed. You are paying for what the model reads, what it writes, which model does the work, and whether repeated context gets cached or discounted.

More precisely: input tokens times input rate, plus cached input tokens times cached rate, plus output tokens times output rate, plus any tool, media, storage, search, or processing-tier fees.

Cost equation

Input Prompt + files + context

Output Answer + generated work

Rate Model price per 1M tokens

Adjustments Cache, batch, tools, fees

That is why total tokens times one generic price is usually the wrong calculation. Output tokens often cost more than input tokens, cached input can be discounted, and tools or media can add separate charges.

Illustrative math

The screenshot-style example implies a blended effective rate. If a dashboard shows $2.24 across 117 million tokens, the visible spend works out to about $0.019 per 1M tokens.

That does not prove a provider has a listed $0.019/M model. It could reflect a specific mix of cached input, low-cost model usage, discounts, credits, filters, billing timing, or project attribution.

Input, output, cached, and reasoning tokens

The token mix is usually more important than the token total. A workload with mostly cached input on a low-cost model can be inexpensive. A smaller workflow with premium output, long context, and repeated tool calls can cost more.

Token type	What it means	Why it matters
Input tokens	Text or context sent to the model.	Large prompts, chat history, documents, code, and tool results can make inputs grow quickly.
Output tokens	Text or structured content generated by the model.	Output is often priced higher than input, so long answers can move spend faster.
Cached tokens	Repeated context that the provider can reuse.	Caching can reduce cost or latency, but it is not always free and can still affect limits.
Reasoning tokens	Internal thinking or reasoning budget exposed by some systems.	These may be billed as output even when the user does not see every token.

Requests are not tokens

An API request is a call to a model. It says how often an app asked for work, not how large the work was. One request can be a tiny prompt, or one request can carry a huge context window with documents, chat history, tools, and output instructions.

Scenario	What the dashboard shows	Common mistake
1,000 short prompts	Many requests, relatively low tokens.	Assuming request count alone means high cost.
10 long document analyses	Few requests, high tokens.	Assuming a short activity list means low cost.
Agentic coding session	Requests, tool calls, file context, retries.	Ignoring background work the user does not see.

How to read a usage dashboard

Usage dashboards vary by provider, but the same families of labels appear again and again. The important move is to separate volume metrics from billing metrics and throughput limits.

Dashboard label	What it means	What to check first
Monthly spend	Estimated or finalized spend for the selected billing window.	Date range, project filter, billing lag, and whether credits are applied.
Balance	Prepaid funds or credits remaining.	Whether the balance is a top-up, free credit, or workspace allocation.
Tokens	Total counted usage units.	Input/output split, model choice, cached share, and media/tool fees.
Rate limits	Throughput rules such as requests or tokens per minute.	Budget and rate limits are different controls.

Where costs hide

The most common surprise is duplicated context. If an app sends the same long document, codebase, or conversation history to the model repeatedly, usage can climb even when the visible chat looks small.

Premium models used for every request, including simple tasks.
Long chat history or files resent on each turn.
Generated output that is much longer than the original prompt.
Background agents that retry, search, call tools, or inspect files repeatedly.
Images, audio, video, search, code execution, storage, or other separately billed features.

Subscriptions and API billing are different

Most consumer AI products charge a monthly subscription for access. API dashboards are different: they usually show metered usage by model, token type, request volume, project, or billing period.

A subscription price does not reveal the exact cost of the underlying model call, and an API bill does not behave like a flat monthly app plan.

How to control spend without tracking every token

The best controls are workflow controls. Route simple work to cheaper models, summarize long context before resending it, watch output length, set budgets and alerts, use caching when available, and batch non-urgent jobs when latency is not important.

Cost lever	Why it matters	Practical control
Model choice	Different models can have very different rates.	Use premium models only where quality actually changes the result.
Context size	Repeated documents and histories multiply quickly.	Summarize, retrieve only relevant chunks, and avoid resending stable context.
Output length	Output tokens are often more expensive.	Ask for concise answers, structured summaries, or limits where appropriate.
Processing tier	Batch or flex modes can discount non-urgent work.	Move offline analysis and bulk jobs out of real-time paths.

Source notes and caveats

Source data last checked: Apr 27, 2026. Specific prices should be timestamped and checked against official provider pages before publication. Tokenizers differ, dashboards can vary by UTC date range, project/org filters, Scale Tier attribution, and how usage versus costs are recorded. Subscriptions are not directly comparable to API billing.

OpenAI API pricing Official OpenAI usage dashboard Help OpenAI costs endpoint Docs OpenAI prompt caching Docs Anthropic pricing Official Claude pricing docs Docs Gemini API pricing Docs Gemini token counting Docs DeepSeek pricing Docs