Bipko Biz Digital News

collapse
Home / Daily News Analysis / What you'll pay for AI agents will be wildly variable and unpredictable

What you'll pay for AI agents will be wildly variable and unpredictable

May 13, 2026  Twila Rosenbaum  9 views
What you'll pay for AI agents will be wildly variable and unpredictable

Among all the challenges of implementing agentic artificial intelligence, the least-understood issue is cost. The providers of AI, such as OpenAI, Google, and Anthropic, have price lists, but none of those listed prices tell users what the final bill will be to actually solve a problem. A new study from the University of Michigan and collaborating institutions—including Stanford, All Hands AI, Google DeepMind, Microsoft, and MIT—provides the first systematic analysis of token consumption in AI agents, and the findings are sobering.

Lead author Longju Bai and his team found that agents consume thousands of times more tokens than simple, turn-by-turn chat interactions. In some cases, an agentic task uses 3,500 times the number of tokens of a standard ChatGPT session. A token, the basic unit of information processed by a language model, can be a fragment of a word or a punctuation mark. But the study reveals more than just high usage: the variability and unpredictability of that usage are the real problems.

Token costs vary wildly between models

The researchers tested several leading models using the open-source framework OpenHands and the SWE-Bench coding benchmark. OpenAI's ChatGPT 5 and 5.2 achieved strong accuracy at moderate cost, while Anthropic's Claude Sonnet-4.5 scored highest in accuracy but consumed far more tokens. Google's Gemini-3-Pro landed in the middle, and Moonshot's Kimi-K2 was the most token-hungry for the lowest accuracy. The differences are not due to task difficulty; rather, they reflect each model's behavioral tendencies—some simply burn through more tokens solving the same problem.

Even more troubling, the same model can double its token consumption when tackling the identical task on different occasions. The study notes that "the most expensive runs double the token and monetary cost of the least expensive runs," highlighting a huge variance even within a single model. This makes budgeting for agentic AI nearly impossible.

Input tokens dominate the cost

The study breaks down token types: input (what the user or tools provide), output (generated responses), and cached (stored context). Strikingly, input tokens dominate agentic coding costs, far outpacing output tokens. The reason: agentic workflows repeatedly feed the same context into the model, accumulating costs as agents retrieve information from memory. "Cache reads dominate both raw token volume and dollar cost," the authors write, noting that this cumulative reuse of prior context is the single biggest expense.

This finding has practical implications. Users can control some input factors—prompt size, context window length, and the number of tool calls—but the core architecture of agents makes them inherently input-heavy. Predicting these costs is a major challenge.

Agents can't predict their own costs

The researchers asked each agent to estimate how many tokens it would need to complete a given task. All models consistently underestimated their requirements, especially for input tokens. Predictions remained compressed even as real usage grew into the millions of tokens. The study states that "models systematically underestimate the tokens they need," making it impossible for users to get a reliable upfront estimate.

Furthermore, more tokens do not guarantee better results. The study found that accuracy often peaks at intermediate cost levels and saturates or even declines at higher costs. Agents tend to keep searching and retrying on unsolvable problems, accumulating expense without progress. "Models lack a reliable mechanism to recognize when a task is unsolvable and stop early," the authors note.

Implications for enterprise adoption

The lack of cost transparency and performance guarantees creates significant risks for businesses. Enterprises need predictable budgets for software investments, but current pricing models from AI providers push the burden onto users to run repeated experiments just to estimate average costs. A company might pay for dozens of agent runs before knowing what a typical task will cost. Worse, there is no guarantee that the agent will complete the task at all, even after burning through thousands of tokens.

This situation mirrors early cloud computing days when usage was unpredictable and costs spiraled. However, cloud providers eventually introduced usage alerts, budgets, and reserved instances. The AI industry is still in its infancy regarding cost management. The study's authors suggest that even coarse-grained estimates could help, such as budget alerts before launching expensive runs. But true cost predictability remains elusive.

The problem is compounded by the rapid adoption of agentic coding tools like Replit, Lovable, Cursor, and others. Users report that the meter is constantly running, and the final bill easily surprises developers. Until vendors provide transparent pricing models and some form of success guarantee, enterprise adoption will be hampered by uncertainty and sticker shock.

Users collectively will need to push back on OpenAI, Google, Anthropic, and other providers to demand reliable cost estimation and task completion guarantees. Without such safeguards, agentic AI may remain a niche experiment rather than a stable enterprise tool. The study from Bai and his colleagues serves as a crucial wake-up call: the cost of agents is far from predictable, and the industry must address this before agentic workflows become mainstream.


Source: ZDNET News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy