Stack Scout

AI Cost Per Token Dropped 67% — So Why Are Enterprise Bills Tripling?

data center server racks - Close-up of computer server rack components

Photo by Đào Hiếu on Unsplash

Key Takeaways
  • As of July 1, 2026, the Silicon Data LLM Token Expenditure Index recorded $1.66 per million tokens — nearly 20% below its May 2026 high — signaling structural commoditization of AI inference.
  • Blended token costs across 2.4 billion API calls fell 67%, from $18.40 to $6.07 per million tokens, between Q1 2025 and Q1 2026. Enterprise bills still tripled over the same stretch.
  • The AI capex-to-revenue gap has reached 46%, already exceeding the 32% divergence recorded during the 2001 telecom bubble, with a $600 billion annual revenue shortfall across major hyperscalers.
  • 72% of enterprise AI projects exceed their original budget by at least 30%, driven largely by agentic workflows that multiply token usage 50–500x per task.

What Happened

$1.66 per million tokens. That is where the Silicon Data LLM Token Expenditure Index landed on July 1, 2026 — nearly 20% below its May 2026 high. According to Bloomberg reporting from July 3, 2026, Citadel macro strategist Andreas Steno Larsen called it "the chart everyone should be watching," warning that sustained weakness in token pricing could unwind the memory, hardware, and data-center investment thesis that has dominated market positioning for the past two years. Google News surfaced the underlying HarianBasis.co analysis on July 4, 2026, framing the index move as a signal of broader AI market maturation rather than a temporary blip.

The surface story is straightforward: per-token AI costs have dropped over 90% since 2023. The operational story for teams running AI-assisted workflows is considerably messier. Blended token costs across 2.4 billion API calls — the data pipeline connecting apps to AI models — fell 67%, from $18.40 to $6.07 per million tokens, between Q1 2025 and Q1 2026. Enterprise AI bills, meanwhile, tripled over the same period. Total AI spending doubled since late 2025. Token consumption grew 450% between December 2024 and December 2025 while token costs were simultaneously being halved. Cheaper per unit. Vastly more units consumed. The math is not working in finance departments' favor — and as of 2026, AI spending now represents 41% of total IT budgets, up from a fraction of that share just three years ago.

The Job You Are Hiring AI Tools to Do — And Where the Real Cost Hides

To understand the billing paradox, apply a simple frame: what specific job is AI being hired to do inside your operation, and how bounded is that task?

First-generation AI adoption assigned language models to discrete manual tasks — drafting emails, summarizing documents, writing first-draft code. Token usage was bounded: one input, one output, task complete. The economics felt manageable.

Second-generation adoption involves agentic workflows — multi-step automated sequences where an AI agent plans, executes, evaluates results, and loops. A single "compile our quarterly competitive analysis" prompt that triggers a research agent, a synthesis agent, a fact-checking agent, and a formatting agent does not consume one transaction worth of tokens. It consumes 50 to 500 times more, per the research data. As covered in the analysis of agentic AI's rapid mainstream adoption, this task-scope expansion is outrunning most enterprise budget models by a wide margin.

The moment you outgrow single-prompt use cases and enter agentic territory, your original cost model breaks. Uber recognized this pressure: as of 2026, the company imposed a $1,500 per employee per month spending cap on AI coding tools — a hard signal that "tokenmaxxing" experimentation has given way to disciplined tokenomics with ROI accountability. Most small and mid-sized businesses lack even that governance infrastructure, let alone the usage-log visibility to know they need it.

The vendor pricing shift compounds this. Hybrid models — a base subscription plus usage overage charges — became the industry standard, adopted by 41% of AI vendors in 2026, up from 27% in 2025. Pure per-seat pricing fell from 21% to 15% of the market. Translation: the all-inclusive pricing that made early AI productivity software feel budget-safe has been quietly phased out across the majority of the ecosystem. Citadel Securities stated it directly: "The core constraint on AI adoption has shifted from best model performance to cost and scarcity, with users rapidly migrating toward cheaper alternatives."

developer laptop API screen - turned on gray laptop computer

Photo by Luca Bravo on Unsplash

Why the Capex Gap Matters Even for Small Teams

The macro picture has downstream consequences for every team signing an AI tool contract, because vendor pricing is a function of hyperscaler economics.

As of 2026, the five largest hyperscalers — Amazon, Microsoft, Alphabet, Meta, and Oracle — are projected to spend $700–900 billion on capital expenditures, a 36% increase over 2025. Gartner's official forecast, as of July 4, 2026, puts worldwide AI spending at $2.59 trillion for 2026, a 47% year-over-year increase. Global corporate AI investment reached $581.7 billion in 2025, with private investment growing 127.5% to $344.7 billion.

The structural problem: the AI capex-to-revenue gap has reached 46%. The 2001 telecom bubble peaked at a 32% divergence. The current annual shortfall stands at $600 billion. JPMorgan has publicly dismissed the token price pullback as a "minor speed bump" in the AI investment cycle. The Bloomberg analysis from July 3, 2026 documented the more bearish read from Steno Larsen — that the pricing decline could signal a structural shift in the hardware and data-center trade, not just a temporary dip in sentiment. Both views may be partially right, which is precisely why the divergence deserves attention rather than dismissal.

Blended AI Token Cost Per Million TokensQ1 2025 vs Q1 2026 — across 2.4 billion API calls$0$6$12$18$18.40Q1 2025$6.07Q1 2026↓ 67% drop in unit cost

Chart: Blended AI token cost per million tokens fell 67% — from $18.40 to $6.07 — between Q1 2025 and Q1 2026, across 2.4 billion API calls. Enterprise bills tripled over the same period as consumption volume surged. Source: Silicon Data LLM Token Expenditure Index research data, as of July 4, 2026.

What this means for productivity software buyers specifically: several major providers are currently pricing inference below their own cost of serving. OpenAI, for instance, spends nearly $2 for every $1 earned on inference — burning capital to capture market share while suppressing prices artificially. The Stanford HAI AI Index 2026 reported that U.S. consumer surplus from AI reached $172 billion annually by early 2026, up from $112 billion a year earlier — real value being delivered to real users. But that surplus is partly a function of below-cost pricing that has a ceiling. When the subsidy ends or shifts, vendor pricing will move.

Before You Commit: The Switching Cost Reality

The switching cost in AI tooling is less visible than in traditional SaaS, and the demo is not the product. It is not primarily about data portability or API migration scripts. It is about the prompt engineering investment your team has made, the workflow orchestration depth baked into your current platform, and the institutional knowledge built around a specific model's behavior patterns. If a team has spent three months tuning an agentic research workflow in one vendor's environment, the real cost of switching is that accumulated context — not the cancellation clause.

Three steps worth taking before any major AI tool commitment right now:

1. Audit your actual token burn rate before signing any hybrid contract.

Pull usage logs from current AI tools and calculate cost per completed workflow — not per seat. If agentic sequences are in use, multiply baseline token estimates by at least 50x to get a realistic cost floor. As of 2026, 72% of enterprise AI projects exceeded their original budget by at least 30%, largely because agentic task expansion was not modeled at contract time. Ask vendors for token consumption benchmarks from comparable deployments before committing.

2. Track cost-per-outcome, not cost-per-token.

Inference — the computational process of running a model to generate outputs — now accounts for 60–70% of total AI compute demand across major hyperscalers, up from roughly 40% in 2024. That compute cost gets passed downstream into vendor pricing. A tool that costs three times more per token but resolves a workflow in a single pass is cheaper than a lower-cost tool that iterates twelve times. Define what "done" looks like for the specific workflow you are automating before adoption, not after onboarding.

3. Negotiate pricing-model flexibility and spending caps into any enterprise contract.

Hybrid pricing — base subscription plus usage overage — is the new default for 41% of AI vendors in 2026. That structure creates billing surprises as usage scales. Negotiate rate caps on overage charges or monthly spending ceilings tied to outcome SLAs (service level agreements — contractual performance guarantees). Uber's $1,500 per employee monthly cap on coding tools is one model; a tiered overage alert system with automatic throttling is another. Either way, the time to negotiate terms is before your agentic workflows go into production, not after the first surprise invoice arrives.

In my analysis, the most important signal from the Silicon Data index decline is not what it says about token pricing in isolation — it is what it reveals about the team-size cliff in AI adoption. Small teams running contained, single-prompt workflows will benefit from falling costs. Teams that have crossed into agentic territory without governance frameworks are already experiencing the bill tripling that enterprise data confirms. The data makes that distinction clearer than most vendor marketing will.

Frequently Asked Questions

Why are AI costs falling but total enterprise AI spending keeps rising?

The drop in per-token costs has been outpaced by a 450% surge in token consumption between December 2024 and December 2025. Cheaper tokens unlock new use cases — especially agentic workflows that run 50–500 times more tokens per task than simple prompt-and-response interactions. This mirrors Jevons Paradox from 19th-century economics: efficiency gains increase total consumption rather than reducing it. As of Q1 2026, enterprise AI bills tripled even as blended token costs fell 67%, and total AI spending doubled since late 2025.

How much does AI cost per token in 2026, and which index tracks it?

As of July 1, 2026, the Silicon Data LLM Token Expenditure Index — which Bloomberg and Citadel strategist Andreas Steno Larsen have both highlighted as a key market signal — recorded $1.66 per million tokens, nearly 20% below its May 2026 high. Across a broader sample of 2.4 billion API calls, blended costs averaged $6.07 per million tokens in Q1 2026, down from $18.40 in Q1 2025, a 67% decline. Individual model pricing varies significantly by provider and model tier.

What is the AI expenditure gap between capex and revenue, and does it affect my software costs?

As of 2026, the AI capex-to-revenue gap stands at 46% — meaning infrastructure investment is running well ahead of revenue generated across major hyperscalers, with an estimated $600 billion annual shortfall. This exceeds the 32% divergence observed during the 2001 telecom bubble. For enterprise buyers, it matters because several providers — including OpenAI, which reportedly spends nearly $2 for every $1 earned on inference — are pricing below cost to capture market share. When that subsidy corrects, pricing will shift. Planning for a higher cost baseline in 18–36 months is prudent.

Is AI pricing sustainable for small business budgets, given the shift to hybrid pricing models?

Current pricing is partially subsidized by venture capital and hyperscaler capex spending. The shift to hybrid pricing — adopted by 41% of AI vendors in 2026, up from 27% in 2025, while pure per-seat pricing fell from 21% to 15% — signals that all-inclusive predictable billing is being phased out. For small teams, the near-term risk is not sudden price spikes but billing unpredictability from usage-based overages. Negotiating monthly spending caps and monitoring token burn rates before agentic workflows scale are practical hedges against that exposure.

Disclaimer: This article is editorial commentary based on publicly reported information and is for informational purposes only. Tool features and pricing may change. Always verify current details on official vendor websites. Research based on publicly available sources current as of July 4, 2026.