Are we approaching a post-chip era of ‘Data Centers in a Box?’

Home AI Infrastructure News Are we approaching a post-chip era of ‘Data Centers in a Box?’

Wafer-scale processors that bypass chips were the focus of a debatable opinion piece in the Wall Street Journal. Are current industrial policies and protectionism favoring an incumbent industry that is facing obsolescence?

Cerebras Founder and CEO Andrew Feldman lauded a Wall Street Journal opinion piece, “The Microchip Era is about to end.” In it, author and tech futurist George Gilder made the bold statement that “all efforts to save microchip production in the U.S. come amid undeniable portents of the end of microchips.” He makes the “inexorable reticle limits of chips” the culprit in vast hyperscale data centers buildouts, with the contention that “data centers in a box of wafer scale processors” could be the way by which U.S. beats China in the post-chip era.

In his response to Gilder, Cerebras’s Feldman referred to “reticle” limits as “the ceiling for progress,” with “more parts, more wiring, and more overhead — all to work around the reticle limit.”

As we’ve reported, single-die chips attempt to address those limits with more modular, chiplet-based designs that optimize small, functional blocks for specific tasks, and that become part of a complete system-on-chip (SoC) package. This construct does invite its own set of problems, such as increased latency and signal degradation, as well as complex power delivery and management across multiple chiplets.

But the wafer-scale integration also brings its own set of challenges. While the wafer-scale engine’s die-to-die interconnect bypasses many latency bottlenecks of multi-GPU setups, it does have a higher probability of a defect on its larger surface area. Injecting some engineering into the discussion, Circle’s Sebastian Barros (formerly of Ericsson and Google) pointed out in his response to the WSJ opinion that “At 3 nanometers, a wafer has about 80 000 mm² of silicon. Even at elite defect densities, the chance of a flawless wafer is essentially zero.”

Since industry standards for leading-edge logic nodes often demand levels below 0.05 defects/cm² for critical layers, a defect can render a $20,000 wafer useless.

Another trade-off to the wafer’s incredible computational performance and speed is Heat. Where an advanced Nvidia chip contains about 208 billion transistors, a Cerebras wafer-scale engine features 4 trillion. One Cerebras CS-3 system generates up to 23 kilowatts (kW) of heat (when operating at full capacity). The new CS-3 system cluster, which stacks CS-3s 16-fold, has 64 trillion transistors!

This is why the compressed “data center in a box” presents significant challenges in terms of power and cooling and cost — enough so that I don’t think it’ll be feasible as a universal replacement any time soon. What it will do, however, is accelerate innovation around the physical and engineering limits of chips.

Right now, wafer-scale makes sense for highly specialized workloads, like those of research institutions, national laboratories, and AI companies requiring massive HPC and AI training and inference workloads.

Though it’s easy to see the enthusiasm for dinner plate-sized silicon, I don’t think we’re at the precipice of a post-chip era. It’ll be a while before compact “data centers in a box” become a ubiquitous alternative to the sprawling data centers currently being built, but it’s good to know they’re possibly on the horizon.

What you need to know in 5 minutes

Join 37,000+ professionals receiving the AI Infrastructure Daily Newsletter

This field is for validation purposes and should be left unchanged.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More