Table of Contents
There might be movement from vendors like Nvidia and AMD, but supply issues run a whole lot deeper
A few short years ago, it would have been hard to predict how massive the demand for AI compute would become. These days, it’s easy to focus on large language models themselves, but while the likes of OpenAI, Google, and Anthropic battle it out for model supremacy, under the hood there’s a totally different power play unfolding. Turns out, control of compute is perhaps even more important than control over data. The real battleground isn’t in how the models perform – it’s having access to the semiconductors and wider infrastructure that powers them.
This shift hasn’t happened gradually, either. Chip foundries are scrambling to expand, leading-edge packaging and interconnect capacity are overbooked, and startups and AI labs are being squeezed out. The solution? Capacity won’t magically scale overnight – resulting in an imbalance that’s giving way to a new hierarchy: hyperscalers with pre-locked deals and custom accelerators, mid-tier infrastructure brokers, and the rest jostling for scraps or inventing around scarcity.
Locking up supply
Supply issues have been plaguing the major AI semiconductor providers for years, but the AI boom has only accelerated the issue – highlighted by the recent OpenAI-AMD deal, which followed another deal between OpenAI and Nvidia.
The announcement confirmed a 6-gigawatt, multi-year agreement for AMD to supply its Instinct accelerators, beginning with 1GW of deployments in 2026. The deal also included a warrant allowing OpenAI to purchase up to 160 million AMD shares, effectively tying both companies’ futures together.
There are multiple ways of looking at this.
- OpenAI wants to reduce its reliance on a single semiconductor company.
- Nvidia may not be able to provide the semiconductors OpenAI is seeking as it ramps up.
The truth likely lies somewhere in the middle. NVIDIA’s most advanced chips – such as those based on its Blackwell architecture – are booked far in advance. Even as the company races to ramp production, limited access to CoWoS (Chip-on-Wafer-on-Substrate) packaging and HBM3E memory has created a bottleneck that affects nearly every vendor. AMD faces similar challenges, relying on the same TSMC and memory suppliers. So while OpenAI’s diversification strategy makes strategic sense, it doesn’t necessarily exempt it from the same underlying supply issues.
And OpenAI isn’t stopping there. The company is also co-designing its own AI inference chips with Broadcom, targeting specialized accelerators rather than general-purpose GPUs. Meta and Google have made similar moves, underscoring a broader industry trend: the hyperscalers’ desire to control as much of their stack as possible.
But what these deals highlight is that regardless of OpenAI specifically, semiconductor supply is set to remain constrained for the foreseeable future. As a result, the bigger players feel the need to lock up as much leading-edge capacity, including wafer starts, packaging slots, and HBM allocations, as far ahead as possible. That could squeeze everyone else out.
Competing in a world of scarcity
Given the fact that AI inference has become such a hot commodity, smaller AI outfits and labs that rely on having their own infrastructure have had to get creative.
“We’ve had to get creative — using hybrid setups that combine local inference with optimized cloud usage, and prioritizing model efficiency over brute-force training,” said Igor Trunov, founder of Atlantix, a venture studio aimed at building AI-driven startups, in an interview with RCR Tech. “Those who can adapt architectures, use open-weight models, and optimize hardware usage will survive this phase. The chip scarcity is forcing a new wave of innovation, not just in hardware, but in how we design and deploy AI systems.”
There are a few ways startups are adapting and innovating in a world where compute is scarce or cost-prohibitive. For example, there’s the model distillation approach, which involves training a smaller, more efficient “student” model to replicate the behavior of a much larger “teacher” model—retaining most of the performance while dramatically reducing the compute, memory, and power requirements needed for inference or further training.
Others, like Hippocratic AI, are focusing on domain-specific models tailored to a single field, where tighter data scope and specialized training allow for smaller, more efficient models that still deliver high accuracy. By narrowing the problem space, these startups can achieve strong real-world performance without relying on massive GPU clusters.
Now that large language models have been around for some time, we’ve seen a renewed focus on smaller, more efficient models that can run at a fraction of the cost (à la Claude 4.5 Haiku) of their larger siblings. These more efficient models make sense for many firms regardless of compute constraints, given the costs associated with some of the heavier LLMs. It’s expected that smaller, more domain-specific models trained on proprietary data will continue to grow in popularity, as the barrier to entry for LLM training continues to trend down.
No overnight fix
Of course, in the long term, it’s expected that AI semiconductors will be increasingly differentiated from the GPUs that have served as the foundation for AI infrastructure so far. That differentiation is expected to come from multiple sources. Some of it will come from the likes of Nvidia and AMD themselves, who have been releasing AI-specific accelerators for some time now.
- NVIDIA’s Blackwell and Rubin platforms integrate dedicated inference cores and a next-generation NVLink fabric that blurs the line between GPU and accelerator.
- AMD’s Instinct MI450, to be built on TSMC’s 2 nm process, will incorporate HBM4 and improved Infinity Fabric interconnects to boost memory bandwidth and reduce latency.
Some of the move towards differentiated silicon will come from the hyperscalers and major AI players. OpenAI, for example, is working on chips with Broadcom, while others, like Meta, have already designed and deployed their own ASICs.
At the same time, hyperscalers are expanding their chip design ambitions. OpenAI’s Broadcom collaboration, Google’s TPU roadmap, and Meta’s MTIA project all point to an era of AI ASICs optimized for specific workloads. That could, in theory, ease demand for top-end GPUs.
Some custom ASICs have been built on more mature nodes like 7nm or 5nm to reduce pressure on cutting-edge fabs. For example, earlier versions of Google’s TPUs and Meta’s MTIA used 7nm, and the newer MTIA v2 is built on 5nm. By leaning on nodes with higher supply and yield, these firms avoid competing directly for scarce 3nm or 2nm capacity — though actual process choices vary and are often not publicly confirmed. To be fair, Google’s latest-gen chips are built on 3nm, however it’s unclear how much of this chip has been manufactured.
But in practice, it’s more likely to shift bottlenecks further upstream — to foundries like TSMC and Samsung, and to HBM vendors such as SK Hynix, Samsung, and Micron, which are already operating at full capacity.
Additionally, new 2nm and HBM4 production won’t arrive in meaningful volume until late 2026 or 2027, meaning today’s shortages are unlikely to resolve quickly. Some companies are experimenting with using older process nodes for components that don’t require cutting-edge density which could help stretch limited 2nm and 3nm supply. And others, like Amkor and ASE, are ramping up advanced packaging capacity to meet hyperscaler demand.
The bigger picture
Control of compute has become a new form of leverage. Access to advanced semiconductors now determines who can build and deploy frontier AI models, and who gets left behind. For hyperscalers, locking in chip supply years in advance has become as strategic as securing data or talent. For smaller players, survival increasingly depends on efficiency – both in model design and in how they use the limited hardware available. Unfortunately, while there are signs some of this is easing, it’s unlikely to resolve any time in the near future.