Amazon’s Trainium3 chip continues hyperscalers’ reduced dependence on Nvidia

Home Semiconductor News Amazon’s Trainium3 chip continues hyperscalers’ reduced dependence on Nvidia
Amazon Trainium3 Trainium4

In sum – what to know:

Trainium3 marks a major generational jump – AWS’ new 3nm accelerator delivers over 4x gains in compute, and is designed to scale from 144-chip UltraServers to clusters of one million accelerators.

Built for next-gen AI at extreme scale – New NeuronSwitch-v1 interconnects and massive HBM3e capacity target agentic models, video generation, MoE systems, and high-throughput inference, with AWS claiming 5x more output tokens per megawatt than Trainium2.

Hyperscalers push deeper into custom silicon – Amazon joins Google, Meta, Microsoft, and OpenAI in reducing reliance on Nvidia, improving cost control and supply resilience, even as Nvidia’s software ecosystem keeps it firmly atop the AI accelerator market.

Amazon has unveiled Trainium3, its first 3-nanometer AI accelerator and the latest step in hyperscalers’ ongoing push to reduce their dependence on Nvidia for AI workloads. The chip represents a significant generational leap, delivering up to 4.4x more compute performance, 40% greater energy efficiency, and more memory bandwidth compared to its predecessor, Trainium2. It is already being deployed by Amazon, who says that Trainium3 is now generally available.

Built for massive scale

While the Trainium3 chip itself is impressive, perhaps more impressive is how it’s built t be scaled. Alongside the Trainium3 launch, Amazon also announced new UltraServer configurations, which the company says can be made up of up to 144 Trainium3 chips, integrated into a single system, delivering 362 FP8 petaflops with 706TB/s of memory bandwidth and 20.7TB of total HBM3e memory (including 144GB from each chip). Beyond individual servers, Amazon says thousands of UltraServers can be linked together to support up to one million Trainium3 chips, representing a tenfold increase in scale compared to the previous generation.

With the chip also comes new networking tech. Amazon says it developed the new NeuronSwitch-v1 interconnect to help eliminate one of the bottlenecks associated with the massive amounts of data being transmitted between chips and servers, and it says that NeuronSwitch-v1 offers 2x the chip-to-chip bandwidth.

The chip is purpose-built for next-generation AI workloads, including agentic systems, reasoning models, video generation, reinforcement learning, and Mixture-of-Experts architectures. Amazon claims Trainium3 delivers over 5x higher output tokens per megawatt than Trainium2 at similar latency during large-scale inference tests, positioning it as a more sustainable option for high-throughput AI serving at scale.

Hyperscalers chart their own course

Amazon’s Trainium3 is part of a broader industry shift in which hyperscalers are increasingly developing proprietary AI silicon rather than relying solely on Nvidia. Google has been iterating on its Tensor Processing Units for years, with recent generations built on advanced process nodes. Meta has deployed its own MTIA accelerators for inference workloads, and Microsoft has reportedly been developing custom AI chips as well. OpenAI, meanwhile, is co-designing inference chips with Broadcom as it seeks to diversify its hardware supply chain.

The motivations behind this trend are clear. Nvidia’s GPUs remain the gold standard for AI training and inference, but demand has consistently outpaced supply, and costs remain high. By building custom accelerators, hyperscalers can optimize for their specific workloads, reduce per-query costs, and gain leverage in negotiations with third-party chip suppliers. It also provides a degree of resilience against supply constraints that have plagued the industry in recent years.

None of this means Nvidia is seriously losing its grip on the AI chip market. The company’s software ecosystem, particularly CUDA, remains deeply entrenched across the industry, and its latest Blackwell architecture continues to set the performance bar for general-purpose AI accelerators. But as hyperscalers bring more custom silicon online, Nvidia’s dominance is likely to start to wane — though we should expect the company to begin to more seriously compete now that it’s no longer necessarily the de facto choice for hyperscalers that want to own their own stack.

What you need to know in 5 minutes

Join 37,000+ professionals receiving the AI Infrastructure Daily Newsletter

This field is for validation purposes and should be left unchanged.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More