How chiplets are powering next-gen AI chips

Image source: 123rf

The semiconductor industry is changing quickly, especially as it relates to AI. As AI workloads grow ever more demanding, old monolithic chips are giving way to new chiplet-based designs.

But what exactly are chiplets and how will they radically improve performance for AI? Here’s a look.

What are chiplets?

Chiplets are small, functional blocks of silicon, each optimized for a specific task, that can be combined within a single package to create a complete system-on-chip (SoC). Unlike traditional monolithic chips, which are manufactured as a single large die, chiplet-based designs allow manufacturers to mix and match different process nodes, IP blocks, and even suppliers.

This modular approach has some serious advantages. For example, chiplets offer better scalability by allowing less critical components to be built on mature, cost-efficient nodes — while reserving cutting-edge processes for high-performance blocks. Because each die is smaller, manufacturing yields improve, reducing waste and overall production costs. They also enable more customization, letting designers build systems for specific workloads by combining logic, memory, and I/O components as needed.

Major players are betting big on chiplets too. AMD’s EPYC and Instinct lines have pioneered chiplet-based CPUs and GPUs, leveraging its Infinity Fabric interconnect. Intel’s Meteor Lake and Arrow Lake CPUs use Foveros 3D stacking to integrate compute, graphics, and I/O tiles. NVIDIA, too, is moving toward chiplet-based architectures, with its latest GPUs expected to adopt multi-die packaging for greater flexibility and performance.

Advanced Packaging Techniques

Image source: 123rf

The rise of chiplets has driven a wave of innovation in packaging technologies, each with unique strengths. Here are some examples.

  • CoWoS and CoWoS-L (TSMC): TSMC’s CoWoS technology connects logic dies with stacked HBM on a silicon interposer, delivering higher bandwidth for AI workloads. The latest CoWoS-L used in NVIDIA’s Blackwell B100 and B200 GPUs supports up to eight HBM3E stacks, with expanded capacity planned through 2026 to meet surging hyperscaler demand.
  • Foveros and Foveros Direct (Intel): Intel’s Foveros enables 3D stacking by vertically integrating compute, graphics, and I/O tiles. Meteor Lake was the first major product to use it at scale, while the upcoming Foveros Direct introduces direct copper-to-copper bonding for even denser integration and higher energy efficiency in future Xeon and client CPUs.
  • SoIC Hybrid Bonding (TSMC): TSMC’s SoIC platform supports chip-on-wafer and wafer-on-wafer bonding, allowing logic and memory dies to be stacked without micro-bumps. This reduces latency and power consumption while increasing interconnect density.
  • X-Cube (Samsung): Samsung’s X-Cube focuses on vertical stacking of logic and SRAM dies to increase bandwidth and efficiency without expanding package size. New bump-less versions are improving density and power characteristics, positioning X-Cube for high-performance computing and AI applications.

CoWoS, Foveros, SoIC, and other advanced approaches are changing how different components are coming together to meet the demands of modern AI hardware. As foundries and OSATs refine these technologies, packaging is becoming a key factor in both performance and manufacturability.

Heterogeneous Integration

Heterogeneous integration refers to combining different types of chips, like logic, memory, and I/O, within a single package. Rather than relying on a monolithic die, this approach brings together specialized components, each built using the most suitable process node for its function.

One of the biggest advantages is performance. Bringing memory physically closer to compute elements reduces latency and boosts bandwidth, which is essential for AI workloads that move massive amounts of data. This proximity also improves energy efficiency, as shorter interconnects consume less power. Flexibility is another key benefit. Engineers can mix and match best-in-class IP from different vendors or process technologies, creating tailored solutions for everything from data center accelerators to mobile SoCs.

Heterogeneous integration also improves manufacturing economics. Smaller, specialized dies are easier to produce with higher yields, and faulty chiplets can be replaced individually rather than scrapping an entire large die. This modularity helps control costs and accelerates development cycles.

However, this shift introduces new challenges. Managing thermal output, maintaining signal integrity, and coordinating complex supply chains across multiple chiplet suppliers all require new design and testing methodologies. Despite these hurdles, the industry is moving quickly in this direction because the performance and efficiency gains are too significant to ignore. As packaging continues to evolve, heterogeneous integration is becoming a cornerstone of next-generation chip design.

Conclusion

Advanced packaging is quietly becoming one of the most important factors in semiconductor progress. By moving beyond single-die chips to modular, chiplet-based designs, the industry is finding new ways to boost performance, efficiency, and flexibility. Technologies like CoWoS, Foveros, and SoIC are central to this evolution, enabling closer integration between compute and memory while improving scalability across products and price points.

As AI workloads grow more complex, packaging will play an increasingly central role in connecting the pieces that drive performance — making it not just a supporting technology, but a key enabler of what comes next.

Related posts

AI chips in the packaging era

Is more memory or better memory going to solve the AI memory wall?

GPU, NPU, ASIC, and FPGA: What are the differences?

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Read More