The Year Models Commodified
When AI stops being the differentiator and starts being the substrate
The magic gets cheap. Then it gets very cheap. Then someone from IBM walks onto a stage and admits the spell never mattered. It was the wand all along.
Google cut Gemini prices by 78%. Anthropic followed with a 67% cut. The cost to match a frontier system on standard AI benchmarks collapsed from $4,500 to $11.64 in a year. These aren’t competitive adjustments. They’re the sound of a premium evaporating.
Gabe Goodhart, Chief Architect of AI Open Innovation at IBM, said it this way: “We’re going to hit a bit of a commodity point. It’s a buyer’s market. You can pick the model that fits your use case just right and be off to the races. The model itself is not going to be the main differentiator.”
Not a critic. Not a short seller. IBM’s own chief AI architect, describing 2026 as the year the tool stops being the product and starts being the substrate. A utility. The thing you stop noticing because it’s just there.
Every technology that has ever mattered has ended up here.
Nicholas Carr said this out loud in 2003. “IT Doesn’t Matter” argued that technology stops generating advantage the moment it becomes ubiquitous. Intel called him wrong. Microsoft called him wrong. History treated him gently. The railroad barons didn’t define the century that followed; the companies that figured out how to ship goods on other people’s tracks did. We are watching the same movie, but the trains are now hallucinating.
Carr was describing IT broadly: servers, networks, software licenses. But the mechanism he identified was precise. Competitive advantage doesn’t travel with the technology itself. It travels with what organizations do with it before everyone else has access. Once everyone has it, the premium for having it disappears. What remains is the penalty for not having it.
“Scarcity, not ubiquity,” Carr wrote, “makes a business resource truly strategic.”
The model is becoming the railroad. The question is whether we’re building the freight network or still selling track.
Commodities announce themselves with price tags, not press releases. Nobody theorizes about copper becoming a commodity. Copper prices fall and you watch.
AI model markets are pricing like commodities.
Google slashed Gemini input prices by 78% in mid-2024, a cut so aggressive it was obviously defensive. Anthropic followed, cutting its flagship product by 67%. OpenAI has repeatedly reduced GPT-4 pricing over the past 18 months in a pattern that looks less like considered market strategy and more like a race with a finish line nobody wants to cross first.
Then DeepSeek. In January 2025, a Chinese research lab published an architecture with performance comparable to frontier American systems on standard benchmarks, trained for an estimated $5.6 million. Nvidia’s stock fell roughly 20% in a week. Not because DeepSeek beat GPT-4. It was close enough at a fraction of the cost, and that fraction told investors everything they needed to know about where pricing was headed.
By the end of 2025, the cost to achieve similar results on AI benchmarks had dropped from approximately $4,500 per task to $11.64. That’s not incremental improvement. That’s the collapse of a premium. The companies absorbing these costs are pricing below cost to acquire market position, which is the behavior of commodities, not innovations. OpenAI expects to burn through $14 billion in 2026. Anthropic burned through nearly $3 billion in 2025. Both are subsidizing access to their products because those products are beginning to look like something they have to offer, not something they charge a premium for.
This should be obvious. It isn’t.
Most AI coverage is still a benchmark race. Which system scores highest on MMLU. Which scores best on HumanEval. Which has the largest context window. These numbers are real. They measure genuine capability improvements. They just don’t measure the thing that matters in production, which is whether the architecture we’ve built with a given tool actually performs reliably in a specific context.
METR, an AI evaluation organization, published a study in mid-2025 measuring whether experienced open-source developers were more productive when using AI tools. The sample was limited: sixteen developers working on tasks they knew well. Developers expected AI to speed their work by 24%. The measured result: using these tools, developers completed tasks 19% slower than without them.
Even after experiencing the slowdown, developers still believed AI had made them about 20% faster.
The feeling of productivity, not productivity itself, is now a market segment.
We’ve built a productivity mythology so thick that we work slower while feeling faster. The benchmarks said one thing. Reality said another. In the gap, developers chose the benchmark.
The biggest obstacle to enterprise AI wasn’t capability. It was evaluation: whether the tool was actually performing. Lucidworks surveyed over a thousand companies on their actual AI deployments and found the pattern everywhere: companies assumed their competitors had achieved more mature implementation than they had, while their own deployments faced familiar bottlenecks. They couldn’t measure what they’d built, so they couldn’t improve it.
Executives believed they were behind their peers but couldn’t measure if they actually were. The competitive anxiety didn’t require evidence. The assumption of others’ superiority arrived pre-formed, driven by the same force that made developers believe they were faster while working slower: we have spent three years treating AI capability as a status signal, and status anxiety operates well below the threshold where measurement would interrupt it. “My competitor has better AI” is the new “my competitor has better consultants.” The belief is the point. The actual performance is almost secondary. If you just thought “this doesn’t describe my organization”: that is exactly what the Lucidworks data says you would think.
We have spent three years writing about which engine is smartest. The organizations paying for these tools are trying to figure out whether the smart engine can do anything useful in their specific environment. These are different questions. We’ve almost exclusively been asking the first one.
When the tool becomes commodity, value migrates up the stack.
IBM calls it “the systems.” Goodhart was specific: rather than talking directly to an AI engine, users are “talking to a software system that includes tools for searching the web, doing all sorts of different individual scripted programmatic tasks, and most likely an agentic loop.” The model becomes a component. The system is the product.
Palantir spent 2025 building what it calls Agentic Foundry: an orchestration layer that lets enterprises deploy autonomous agents drawing on whatever underlying technology best fits the task. The architecture is agnostic to which model it runs on. Plug in the capability. Buy the orchestration. Palantir’s value proposition is not that it has better capabilities than OpenAI. It’s that it doesn’t need better capabilities. It needs the infrastructure that makes those capabilities useful in operational contexts.
Anthropic did something similar at protocol level. Its Model Context Protocol creates a standardized way to connect AI agents to tools, data sources, and external services. MCP isn’t the only contender. OpenAI’s function calling conventions and Google’s A2A protocol are competing for the same territory. But Anthropic is currently setting the terms most enterprises are adopting, and standards have inertia that model performance doesn’t.
Deloitte surveyed enterprises in 2025 on AI maturity. Eighty percent believed they had mature capabilities in basic automation. Only 28 percent believed the same for agentic AI. That gap is where competitive advantage will be built. Not in models. In systems.
The laboratories that built these products have quietly acknowledged this, even while their public postures still emphasize frontier capability.
OpenAI, valued near $730 billion as of early 2026 while burning roughly $14 billion annually, now emphasizes product experience as its moat: the conversational memory accumulated for 700 million weekly ChatGPT users, the multimodal capabilities, the trust relationship with consumers. The capability argument is still in the pitch deck. In private, the moat is described as the data, the interface, the accumulated relationship.
Anthropic, valued around $380 billion, leans on protocol positioning and demonstrated superiority in software development contexts. The MCP argument is that controlling the standard for how agents connect to tools is worth more than any single performance advantage.
Both companies are beginning to tell investors: the model isn’t the primary moat. The capability pitch hasn’t disappeared. OpenAI still leads with benchmark performance. But the direction is clear. This is the corporate version of watching a gold rush town decide what it is once the gold runs out. You can become a real city, or you can remain a gold rush town with better signage.
The discourse about AI and jobs has mostly focused on which jobs AI will replace. This is the wrong question. The better question is which expertise becomes worthless before the people holding it can develop new expertise.
Prompt engineering was a career that lasted as long as a mayfly. There were LinkedIn profiles. There were certifications. There were job postings for a skill set that evaporated the moment the tools got smart enough to ignore us. It turns out “talking to the computer” is only a profession while the computer is still stupid.
This pattern will recur at each layer of the stack. The fine-tuning consultants who charged premium rates for model-specific customization in 2024 are already discovering that the general-purpose version of what they were selling has been absorbed into the tools. Domain-specific fine-tuning (medical, legal, manufacturing) will hold value longer. The labor doesn’t disappear. It reconfigures toward whatever the current architecture cannot yet do without human intervention, until it can, at which point a new certification category appears on LinkedIn.
“The machine doesn’t hum on its own. It hums because someone is tending it,” as we explored in examining the maintenance class that keeps AI infrastructure functional. When the technology becomes infrastructure, the maintenance workers don’t become visible or better compensated. They follow the value to the orchestration layer. New substrate. Same arrangement.
This is what “The Quiet Utility” described for systems that have already completed the transition.
Once a technology becomes infrastructure, the companies running it benefit from our forgetting. The spam filter that processes fifteen billion emails daily generates no press releases, no coverage, no accountability: precisely because it works. When AI products complete their transition to infrastructure, the companies orchestrating them will inherit the same dynamic. And so will we: our habits, our workflows, our definitions of what counts as competent work, shaped by the infrastructure until we can’t see what we’ve adapted around.
Infrastructure aspires to invisibility. The companies running it prefer it that way. Your forgetting is their moat. We’ll keep organizing conferences around benchmarks because measuring capability is easier than asking what it’s for. The model is no longer the differentiator. The question is whether we noticed before we became the substrate ourselves.






"The feeling of productivity, not productivity itself, is now a market segment.
We’ve built a productivity mythology so thick that we work slower while feeling faster. The benchmarks said one thing. Reality said another. In the gap, developers chose the benchmark."
I don't know enough about AI to tell if these conclusions are right, but I do know that they applied to the technology for communication, knowledge, learning, and other purposes during my straight job corporate years. The reality never matched the hype.