Token Cost Console
Build or buy your tokens? Compute the self-hosted cost per million tokens — GPU cost ÷ tokens produced — against an API's list price, find the break-even volume, and see why utilization decides it, in any currency.
Throughput, GPU cost & API price → cost per million tokens.
Build-vs-buy console
At 2,500 tok/s and 80% utilization, your GPU produces 7.2M tokens/hour for $3.07/hr — $0.43/1M vs the API's $3.00. Above ~1943M tokens/month (covering fixed ops), self-hosting wins.
Self-hosted cost = hourly GPU cost ÷ tokens/hour, so utilization is the hinge — an idle GPU has a terrible cost per token.
Raise throughput via batching (see LLM Serving); decide own-vs-rent in Accelerator ROI.
Currency conversion uses indicative rates — verify against a live source for contracts.
Why volume decides build vs buy
At high utilization, the per-token cost of running your own (or rented) GPUs is a fraction of API list prices — because the API price bundles the provider's margin, overhead and convenience. Volume is what unlocks the saving.
Token APIs price for convenience: no infrastructure, instant scale, no ops. That premium is genuinely worth it below a break-even volume — but above it, self-hosting's raw cost dominates.
Self-hosted cost per token is the GPU's hourly cost divided by the tokens it produces — so an idle GPU has a terrible cost-per-token. Only steady, high-throughput serving makes owning cheaper than the API.
Every API and every infrastructure plan prices in dollars per million tokens. Computing your self-hosted figure on that same axis is the only honest way to compare against the API menu.
The price of a million tokens
Every LLM deployment eventually faces the same question: keep paying an API per token, or stand up your own serving? The answer lives on a single axis — cost per million tokens — and the honest comparison puts your self-hosted figure next to the API's menu price on that exact axis. Self-hosted cost is disarmingly simple: the all-in hourly cost of the serving GPU divided by the tokens it actually produces in that hour.
That denominator is everything. A GPU's hourly cost is roughly fixed whether it's flat out or idle, so cost per token is inversely proportional to utilization and throughput. Run it hard, batching many requests, and the cost per token drops to a fraction of any API price. Run it at twenty percent, and each token costs five times as much — often worse than the API. This is why self-hosting is a high-volume game and the API is the right answer for everyone below the threshold.
The API's price is higher per token for good reasons: it bundles the provider's margin, their infrastructure and operations, and the genuine value of zero setup and instant elastic scale. Below a break-even volume, that bundle is a bargain — you'd waste far more on underused hardware. Above it, you're paying the markup on billions of tokens, and the raw economics of owning the serving dominate. The break-even, which this console computes, is where fixed operating costs and the per-token gap balance.
So the decision is really about your steady volume and how hard you can drive utilization. Push throughput up with batching — modeled in the LLM Serving console — to lower self-hosted cost per token, and decide whether to own or rent the GPUs in the Accelerator ROI console.
Trusted by AI Product & Inference Cost Teams
“Self-hosted cost per million tokens against the API list price, with the break-even volume, is exactly the build-vs-buy analysis I run. Seeing our 70B self-host at $0.45 vs $3 API — but only above the break-even — is the decision in one screen, in dollars and euros. Utilization as the hinge is correctly central.”
“The markup-on-the-API-bundles-convenience framing is the truth, and this quantifies exactly when that convenience stops being worth it. Batching raising throughput to cut cost-per-token is the lever we pull. Pairs perfectly with the serving and accelerator-ROI tools.”
“Clean cost-per-Mtok with the break-even monthly volume — that's the number finance approves on. Currency support saves a spreadsheet. Would love demand-variability modeling, but as a build-vs-buy decision tool it's exactly right.”
“The low-volume preset correctly says stay on the API — don't buy idle GPUs. Knowing the token volume we'd need to justify self-hosting is the reality check. Honest about utilization, fast, multi-currency. Excellent.”
Love using our calculator?
Related tools
Similar Calculators
More tools in the same category
Inference Cost Calculator
Estimate deployment costs for AI models across cloud, edge, and hybrid infrastructures with per-query, per-token, and per-hour pricing models. Integrates GPU/ASIC rental rates, network egress, storage, and scaling overhead for accurate inference TCO analysis.
Training Cost Calculator
Calculate AI model training expenses including GPU cluster rental, data transfer, checkpoint storage, and engineering time with distributed-training overhead modeling. Supports LLM, vision, and multimodal training with FLOPs-to-cost mapping and carbon-footprint estimation.
GPU Cluster Sizing
Determine optimal GPU cluster configurations for training and inference workloads with interconnect topology modeling, memory-bandwidth balancing, and fault-tolerance planning. Supports NVIDIA, AMD, and custom accelerator clusters with InfiniBand and NVLink network analysis.
Model Fit Checker
Verify whether AI models fit within hardware constraints including GPU HBM capacity, on-chip SRAM, and interconnect bandwidth with layer-wise memory profiling. Supports model parallelism, pipeline parallelism, and ZeRO optimization recommendations for large-model deployment.
HBM Bandwidth Calculator
Estimate memory bandwidth requirements for AI workloads with operation-type analysis, data-movement profiling, and roofline model integration. Calculates HBM generation selection, channel count, and clock-speed requirements to eliminate memory-bound bottlenecks.
AI Chip Comparator
Compare AI accelerators across performance, cost, power, and software-ecosystem metrics with normalized benchmarking for training and inference workloads. Supports NVIDIA, AMD, Intel, Google TPU, Amazon Trainium, and custom ASICs with TCO-per-FLOP analysis.
Often Used Together
Complementary tools for complete analysis
Related Articles
Dive deeper with our expert guides and tutorials related to Token Cost Estimator
self-host /1M = (GPU $/hr + power) ÷ (throughput × 3600 × util) × 10⁶ · break-even = fixed/mo ÷ (API − self per 1M) · Last reviewed: 2026-06