Hyperscalers are all making ASICs — so why are they still buying from Nvidia and AMD?

Home Semiconductor News Hyperscalers are all making ASICs — so why are they still buying from Nvidia and AMD?
Amazon Trainium3 Trainium4 ASIC

Will ASICs completely take over from the GPU workhorses?

The world’s largest technology companies are doing something that looks, at first glance, like a bit of a contradiction. Amazon, Google, and Meta have poured billions into developing custom ASICs built for their exact computational needs, and yet they keep placing enormous orders for GPUs from Nvidia and AMD.

But this isn’t a contradiction at all. It’s a calculated strategy that involves driving down internal infrastructure costs while still serving the wildly varied demands of external customers on their cloud platforms. Understanding why these companies are walking both paths at once reveals a lot about where AI computing actually stands today — and where it’s likely headed.

Internal vs external

Custom ASICs, like Google’s Tensor Processing Units, Amazon’s Trainium chips, and Meta’s MTIA silicon, exist to solve a specific problem — handling the repetitive, predictable workloads that run continuously at massive scale. Think machine learning inference, video processing, the algorithms behind news feeds and recommendation engines. When you know exactly what computation you need to run millions of times daily, building hardware purpose-made for that task starts to make a lot of economic sense.

Where ASICs shine is in performance-per-watt and total cost of ownership for these steady-state computations. The silicon gets stripped down, unnecessary features removed, every transistor dedicated to the exact operations the workload requires. At hyperscaler volumes even small efficiency gains compound into serious savings.

GPUs are used a little differently. They’re purchased primarily to serve the third-party developers and enterprises building on AWS, Google Cloud, and Azure. External customers deploying AI workloads need flexible hardware that can handle diverse, often unpredictable tasks. A startup experimenting with some novel ML architecture and an enterprise running battle-tested production models both need general-purpose acceleration capable of adapting to whatever they throw at it. This is where purchased GPUs remain indispensable.

GPUs aren’t going anywhere

Nvidia’s CUDA ecosystem has become the de facto standard for AI and high-performance computing development, and that dominance runs far deeper than hardware specs. It’s about software compatibility and developer expectations. When researchers and engineers write AI code, they’re overwhelmingly writing it for CUDA-compatible GPUs, expecting that code to work across platforms and cloud providers without major rewrites.

This portability creates a real barrier to ASIC adoption for general-purpose workloads. Code tuned for Google’s TPUs won’t easily migrate to Amazon’s Trainium or run on a customer’s on-premises setup. Enterprises that want to avoid vendor lock-in, or who simply need flexibility across environments, find that GPUs offer a kind of portability that custom silicon can’t replicate.

The architectural differences matter too. GPUs are programmable accelerators that can pivot between different tasks. ASICs are, by definition, hardwired for specific operations. One GPU can handle standardized inference in the morning and experimental research by afternoon. A custom ASIC designed for transformer inference? It can’t suddenly handle a novel architecture without going back to the drawing board.

Efficiency is king

None of this diminishes the efficiency case for custom ASICs, which is increasingly hard to ignore. Industry analysis suggests ASICs deliver 30 to 40% better power efficiency than general-purpose GPUs for their target workloads. For certain specific operations, the gains are even more dramatic — two to ten times better performance per watt.

These differences are starting to matter more as data centers hit what some analysts call the “power wall,” which is the point where electricity availability, not chip availability, becomes the constraint on AI scaling. Major tech companies are already scrambling for access to power generation capacity, exploring unconventional energy sources to feed their AI ambitions. In that environment, extracting more computation from every megawatt becomes a massive priority.

Hyperscalers are also pricing their internal ASIC-based services at 30 to 50 percent discounts compared to equivalent GPU cloud offerings. Those numbers point to substantial total cost of ownership advantages for running workloads on custom silicon. At a massive scale, these efficiency and cost benefits more than justify the significant upfront investment in chip design and manufacturing.

Will custom chips take over?

Custom silicon isn’t a quick optimization — it’s a long-term strategic bet. Google has already cycled through six generations of TPU architecture and now runs most of its internal software on custom chips. Amazon and Meta are charting similar courses with their own silicon programs. Each generation brings better efficiency, broader capability, and an expanding range of supported workloads.

The fundamental tension driving this dual strategy — owning your hardware destiny while still offering customers flexible solutions — isn’t resolving anytime soon. Hyperscalers clearly want to reduce their Nvidia dependency and squeeze costs out of their massive internal operations. Custom ASICs deliver on both fronts. But the need to support external startups and enterprises with diverse, portable, flexible compute requirements pulls hard in the opposite direction.

Whether hyperscalers will eventually build ASIC offerings compelling enough (and well-supported enough) to pull third-party customers away from GPUs remains an open question — but it would require more than competitive hardware. It would mean comprehensive software ecosystems, developer tools, and migration paths that rival what Nvidia has constructed over decades. Most industry analysts remain skeptical that kind of displacement happens at scale anytime soon.

For the foreseeable future, hyperscaler GPU demand will likely stay substantial even as custom silicon programs expand. The two approaches serve different needs and different customers — and both will continue playing essential roles in the AI infrastructure underpinning the modern tech industry.

What you need to know in 5 minutes

Join 37,000+ professionals receiving the AI Infrastructure Daily Newsletter

This field is for validation purposes and should be left unchanged.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More