AI doesn't have a Moore's Law
April 2026 • 1638 words • 10 min read
AI progress isn't one curve doubling on a clock. It's a bundle of partially correlated curves, each running on its own logic, each hitting its own wall. Four curves drive the compounding. Five walls shape where it runs.
It has something harder to name, and harder to ignore.
In January 2023, two months after ChatGPT launched and right after OpenAI made the GPT APIs publicly available, I started building annexr (site is still up, but product is not). I thought I was late to the party. Three years later, I realize I was early.
That mismatch, feeling late while being early, is the signature experience of building in AI right now. It has nothing to do with individual foresight. It has everything to do with how progress in this field actually compounds.
Three long lines that finally got noticed
What we call “the AI moment” isn’t one thing. It’s the convergence of three independent lines of work, each of which took decades to mature.
1. Information theory. In 1948, Claude Shannon (the original Claude) published A Mathematical Theory of Communication, a mathematical language for information itself: entropy, channels, compression. It laid the mathematical foundation that machine learning, deep learning, and modern AI are built on. Every token prediction in a modern LLM is a descendant of that paper.
2. Hardware. The transistor was invented at Bell Labs in 1947. Seven decades of shrinking, specializing, and parallelizing led to GPUs breaking into deep learning with AlexNet in 2012, and to Google’s TPUs in 2016. The physical substrate that made scale possible.
3. Architecture. In June 2017, a team at Google published Attention Is All You Need, proposing the Transformer. Google understood it well enough to ship BERT in 2018 and T5 in 2019. What it didn’t do was productize at the pace the architecture deserved, likely constrained by search revenue risk.
By 2019 and 2020, with GPT-2 and GPT-3, the three lines had already intersected. The technical trajectory was visible to anyone looking closely. What changed in late 2022 was not the technology but the interface. ChatGPT put the intersection in front of everyone on the internet. The rupture was a moment of attention, not a moment of invention.
One curve vs many
Moore’s Law was one curve: transistor density doubling on a predictable clock. Clean, measurable, predictive.
It’s worth noting that Moore’s Law was never really a law. Externally, it looked like a natural regularity. Internally at Intel, it was a goal, a commitment the company worked hard to keep. When physics pushed back, they adapted: more cores, new process nodes, new architectures. The curve held because a company decided it had to.
AI progress isn’t like that. There’s no single Intel shepherding a single metric. And there’s no single metric.
It’s a bundle of partially correlated curves: compute, algorithms, inference cost, hardware efficiency, and how long AI systems can work without human help. Each moves fast, each on its own logic, none in perfect lockstep.
Moore’s Law and Wright’s Law, running at once
Moore’s Law runs on the calendar. Wright’s Law runs on cumulative work done.
In 1936, Theodore Wright observed that every time production doubles, costs fall by a predictable percentage. The pattern has held across solar panels, batteries, genome sequencing, semiconductors: the more you make something, the cheaper it gets per unit of output.
What’s unusual about AI is that both effects are running simultaneously. Compute-per-dollar improves on a calendar clock, from chip advances and algorithmic efficiency. Cost-per-capability falls with cumulative deployment, as models get compressed and re-served more efficiently at scale. The two reinforce each other. That coupling is what drives rapid progress. The decoupling, as we’ll see, is what produces the walls.
Four curves worth tracking
Training efficiency and inference efficiency are different problems. The first is about how much you extract from a GPU during a training run. The second is about how cheaply you serve the resulting model. Both improve through different mechanisms, and both are improving at once. That’s why the gap between a research result and a commodity API keeps compressing faster than expected.
One useful lens before the numbers: each of these curves is driven mostly by one of the two forces, Moore or Wright, sometimes both. Training compute, hardware efficiency, and algorithmic efficiency are mostly Moore-like, pushed by R&D and capital on a calendar clock. Inference cost is mostly Wright-like, pushed by cumulative usage: the more a model is served, the cheaper it gets to serve. Task horizon is the one that benefits from both sides, which is part of why it is moving as fast as it is.
Training compute: ~4.5x per year, though lumpy, not smooth. (Moore-like) Since 2010, compute used to train frontier models has grown 4.5x annually. Algorithmic efficiency improves separately: each year, the same performance can be achieved with roughly 3x less compute. These aren’t fully independent (better algorithms get reinvested into larger runs), and the frontier is increasingly capital-constrained. When three companies control the capital needed to train frontier models, the curve’s future shape depends on their strategic choices, not just on technology.
Inference cost: repeated order-of-magnitude declines. (Wright-like) Across benchmarks, the cost to reach a given level of capability has fallen between 9x and 900x per year. The wide range reflects real variance: vendor pricing strategy, model compression, architecture changes. The point isn’t a clean rate. It’s that order-of-magnitude drops in the cost of a specific capability have occurred repeatedly. Most of this is learning-by-doing at civilizational scale: the more tokens the industry serves, the more it learns about how to serve them cheaply.
Agent task horizon: doubling roughly every four months (in a narrow benchmark). (Both) METR’s January 2026 update estimates the post-2023 doubling time at about 130 days. Caveat: this measures how long an AI can work on coding tasks before needing help. “Task horizon” isn’t a standardized metric, and it measures length, not reliability. But the trend within the benchmark is steep and consistent. This curve benefits from R&D (new training techniques, better scaffolds) and from deployment feedback (what agents actually fail at in production), which is why it is outpacing curves driven by only one force.
Hardware efficiency: 30–40% better per year. (Moore-like) Hardware costs have declined 30% annually, with energy efficiency improving 40% each year. This partially overlaps with the training compute curve; they aren’t fully independent variables.
These curves also interact causally. Falling inference cost enables more compute at inference time, which improves reasoning on hard tasks, which changes the shape of the task-horizon curve, which then runs into the reliability wall described below.
Where the curves break
The walls come from where the curves decouple, each binding at a different layer.
The logarithmic trap. Compute scales exponentially. Benchmark performance scales roughly in proportion to the log of compute: each doubling of performance requires far more than a doubling of the training budget. At sufficient scale, the cost of the next marginal gain exceeds the economic value it unlocks. The current industry response is called “test-time compute”: it shifts the budget from training to reasoning during use. It works for some tasks. Whether it generalizes, and whether the economics hold at scale, is still open.
The data wall. High-quality human text is largely exhausted at current frontier training scales. Synthetic data was expected to trigger model collapse, but reasoning models have changed the picture: tasks with crisp right/wrong answers (math, code, logic) and model self-play in narrow domains have kept the curve alive. The open question is whether these techniques generalize to everything else.
The reliability gap. Task horizon measures length, not robustness. An agent that can work for 100 hours but has a 1% chance of catastrophic failure each hour is still unusable for anything high-stakes. Public reliability data lags horizon data badly, itself a signal that the industry has optimized for the metrics that are easy to benchmark.
The power grid constraint. Moore’s Law was about density: doing more in the same space and power. AI scaling is increasingly about absolute magnitude: bigger substations, more gigawatts. Hardware efficiency is improving, but cluster size is growing faster. We’re moving from a silicon-constrained world to a copper-and-transformer-constrained one. And by transformer I mean the electrical kind, the unglamorous infrastructure that physically moves gigawatts into a data center. Power grids don’t follow Moore’s Law. They follow the much slower clock of civil engineering and utility regulation.
The institutional curve, which is essentially flat. Even if agent capability doubles every four months, the time for a large enterprise to approve, audit, and deploy an autonomous agent is measured in years. Compliance, procurement, security review, change management: none of these scale exponentially. The AI curves are steep. The institutional curve is close to flat. Most of the interesting work of the next decade sits in that gap: not pushing the capability frontier further, but compressing the institutional one. Packaging, integrating, and derisking systems so organizations can absorb them at anything approaching the speed they’re arriving.
What this means
The broad direction is hard to argue with. For several years, two things have been improving roughly tenfold per year: how much capability you can get for a given price, and how long an AI system can work without a human in the loop. And both on separate clocks: one driven by the calendar, the other by cumulative usage.
That combination is rare in the history of technology.
The reason it feels like a sudden change, rather than a smooth curve, is that several things happened at once: models crossed from interesting demo to actually useful, ChatGPT put them in front of a billion people, APIs pushed them into every tool people already use, and human intuition about how fast software can improve simply lagged the reality.
The practical effect is that the line of what’s worth building keeps moving. Something too expensive or too unreliable last quarter can make sense this quarter. Something that barely works today can be boring infrastructure a year from now.
But the problems that block real adoption have shifted. They’re no longer mostly about capability. They’re about reliability, power and data center capacity, data quality, and how long it takes large organizations to trust and integrate new systems. Those clocks run much slower.
The gap keeps widening. The AI side is steep and will stay steep. The institutional side is close to flat and will stay flat. Most of the interesting work for the rest of this decade sits in closing that gap, not in pushing the frontier further.
Sources and further reading
- Epoch AI: Trends in frontier AI training compute and algorithmic efficiency
- Epoch AI: LLM inference price trends
- METR: Measuring AI ability to complete long tasks (Jan 2026 update)
- Stanford HAI: AI Index Report
- Shannon, A Mathematical Theory of Communication (1948)
- Vaswani et al., Attention Is All You Need (2017)
- AlexNet (2012)
- Moore’s Law
- Wright’s Law / Experience curve effects