The AI moment was seventy years in the making

Three independent lines of work. Each took decades. None waited for the others.

In 1947, Bell Labs invented the transistor. A year later, Claude Shannon, working in the same building, published a paper on the mathematics of communication. Neither discovery knew what it was starting.

What we call “the AI moment” is not one thing. It is the convergence of three threads, each of which matured on its own schedule, developed independently of the others.

The first thread: a language for information itself

In 1948, Claude Shannon published A Mathematical Theory of Communication. He was trying to solve a practical problem: how do you transmit a signal reliably over a noisy channel?

What he produced was something much larger. A mathematical language for information itself. Entropy. Compression. Channel capacity. The idea that meaning could be separated from medium, quantified, and reasoned about formally.

It laid the mathematical foundation that machine learning, deep learning, and modern AI are built on. Every token prediction in a modern language model is a descendant of that paper. Shannon did not know he was laying the foundation for AI. He was solving a telephone problem.

The second thread: hardware that could actually run the math

The transistor was invented in 1947. Seven decades of shrinking, specializing, and parallelizing followed.

The moment hardware met modern AI visibly was 2012. A team at the University of Toronto trained a neural network called AlexNet on GPUs and won ImageNet by a margin that made everyone in the field stop and recalculate. GPUs were designed for graphics. They turned out to be the right shape for deep learning: massively parallel, fast at matrix operations, cheap enough to experiment with.

Google followed with custom silicon. TPUs arrived in 2016, designed specifically for neural network workloads. The physical substrate that made scale possible was now being purpose-built for it.

The third thread: an architecture that changed everything

In June 2017, a team at Google published Attention Is All You Need. The paper proposed the Transformer: a new architecture that dropped recurrence entirely and used attention mechanisms to process sequences in parallel.

Google understood it well enough to ship BERT in 2018 and T5 in 2019. What it did not do was productize at the pace the architecture deserved. The constraint was probably search revenue. If your core business is people finding answers on web pages, a model that answers questions directly is a threat you have to manage carefully.

OpenAI had no such constraint. GPT-2 in 2019. GPT-3 in 2020. The technical trajectory was visible to anyone watching. The three threads had already crossed.

What actually happened in 2022

By the time ChatGPT launched in November 2022, nothing fundamental had changed in the underlying technology. The models were better, but the architecture, the hardware, the training approach: all of it had been in place for years.

What changed was the interface. A chat box that anyone could use. No API key. No prompting knowledge required. The intersection of seventy years of work was suddenly in front of everyone with an internet connection.

The rupture of 2022 was not a moment of invention. It was a moment of attention.

The three threads took different amounts of time. Shannon’s 1948 paper needed sixty years of hardware before it could run at scale. The Transformer needed less than five years to go from research paper to the technology inside a product used by a hundred million people.

The gap between discovery and deployment has been shrinking the whole time. That, more than any single breakthrough, is what makes the current moment feel different from the ones before it.

The follow-on question, where those curves are headed and where they break, is what I wrote about in AI doesn’t have a Moore’s Law.

The AI moment was seventy years in the making