From the ENIAC to the Cerebras chip, looking clear-eyed at hardware innovation is frankly amazing.

Microchip and Nvidia logo displayed on a phone screen are seen in this multiple exposure … More
For anybody who did any kind of computing in the 1950s or 1960s, (or for that matter, in the 1970s or the 1980s), the advances that we’ve made in hardware are staggering.
But if you do track them to Moore’s law and really analyze how incremental doubling works, it all sort of makes sense.
If you remember the story about the smartie (in some versions of the tale, the inventor of the game of chess) who asked a king to double a grain of rice 64 times, you get some sense of how this works – an exponential advance that starts out small and reasonable, and ends up in the realm of what people would have originally thought to be fantasy.
It’s all basic math, but it still seems like a magic trick – you’re moving along: 1, 2, 4, 8, 16, 32 – and then you wind up with a number bigger than you can visualize! I also found this aside from Wikipedia, describing the original contriver of this trick, to be riotously funny: “Versions (of the story) differ as to whether the inventor becomes a high-ranking advisor or is executed.”
This idea, too, has been applied to our new AI gold rush – in fact, for students of artificial intelligence minutia, none other than Ray Kurzweil talks about the “second half of the chessboard” phenomenon that goes right to the heart of what I’m trying to articulate – that the first numbers start out seeming very reasonable, until you get about halfway into the series, and then the real exponential change happens. In other words, there’s a slow rise, and then a break point where that line spirals up into the stratosphere.
If you double the grains of rice on a chessboard and come up with 18 quintillion, 446 quadrillion, 744 trillion and change, you get a powerful visual example of how this works.
Well, not a visual example, exactly, because you can’t fit all of those grains of rice into a single image, but a conceptual idea of how it works.
The Frenetic Pace of Hardware Engineering
I previously wrote about the Cerebras WSE chip that’s about the size of a dinner plate, and has some 90,000 cores.
Sources including IEEE Spectrum show that the WSE‑2 (Cerebras CS‑2) delivers approximately 7,500 trillion FP16 operations per second (7.5 petaFLOPS), while the WSE‑3 gets up to 125 petaFLOPS.
This is a powerful thing to hold in your hand, and it demonstrates the power of parallel processing. It wasn’t too many years ago that we were using single core chips, then dual core, then quad core. Well, you get the idea.
Now we have things like the Huawei Cloudmatrix, where they keep some of these numbers under wraps, but you know intuitively that the hardware power is unreal.
Hardware as a Habitat
There was an interesting TED talk given by Caleb Sirak that I attended recently where the young innovator talked about hardware as a “silicon prison.”
He went over the history of hardware acceleration from early systems, measured in millions of operations per second, to then trillions, and even quadrillions.
He pointed out that in some ways, this process was driven by gaming, but it quickly made its way past that single use case with Nvidia Cuda and other designs.
Now, he suggested, it’s time to change with the times.
“It’s time to rethink the whole system,” Sirak said.
Winning the Hardware Lottery
One tool in pioneering new kinds of GPUs, Sirak added, is quantization.
For example, a 4-bit multiplier is orders of magnitude more efficient than a 32 bit multiplier.
He talked about innovations like the Cerebras chip (in this context,) and how you can reduce the trajectory of data transfers to drive efficiencies with AI.
“When each parameter is smaller, we can put more of them across the system per second, and this decreases bottlenecks across memory and network interconnects,” he said.
He talked about xAI’s Colossus, with what he called a “road map” to a million GPUs. And then, back to efficiencies:
“You can take a flexible city street grid with a lot of different cars,” he explained,”or you can have an f1 track, and if you have an f1 track, that f1 card goes pretty damn fast.”
He covers the work of various companies in designing smart swarms of hardware pieces, which he referred to as “intelligent colonies.”
“The power that we’re unlocking through these AI chips and these advancements has profound implications,” he added, “and increasingly accessible and globally shared innovation is crucial. At the heart of this revolution lies a supply chain of remarkable complexity, from rare earth minerals combined on one continent, to chemicals on another, and chips on the third.”
As another example of an interconnected world, he noted that a single chip can cross dozens of national boundaries before being ultimately put into production. That makes abundant sense to anyone who knows that Taiwan Semiconductor provides the lion’s share of chip fabs worldwide.
In any case, we really are in the hockey stick curve of a hardware acceleration for the ages. Stay tuned.