OpenAI and Broadcom unveil "Jalapeño" inference chip

Table of Contents

OpenAI’s new chip is already running GPT-5.3 workloads

In sum – what we know:

A purpose-built ASIC – Jalapeño strips out training logic to optimize hard for LLM inference like chatbot responses, code generation, and agentic apps.
A nine-month timeline – OpenAI says it co-developed the chip in roughly nine months, crediting its own AI models for speeding up the design work.
Efficiency over breadth – Broadcom’s CEO claims early tests show roughly 50% lower cost and 50% better cost efficiency versus standard AI GPUs, though specs and benchmarks remain unverified.

OpenAI and Broadcom are showing off the initial fruits of their partnership to build custom silicon. The two companies have taken the wraps off of “Jalapeño,” a custom application-specific integrated circuit built for one job and one job only — running large language model inference. The two companies are billing it as an “Intelligence Processor” and “AI accelerator,” and it’s the first piece of silicon in what they describe as a multi-generation infrastructure buildout rather than a one-off chip.

According to the two companies, engineering prototypes are already sitting in OpenAI’s labs processing real machine learning workloads, including GPT-5.3-Codex-Spark, at what the companies say are production-target frequency and power. The plan is to begin initial volume production and large-scale deployment by the end of 2026, then scale that rollout progressively through 2029.

None of this is surprising, though it does still represent a big move for OpenAI. OpenAI, of course, wants to own more of its stack, and Broadcom wants to be the foundry partner that helps hyperscalers do exactly that.

Design and architecture

Jalapeño is built for inference, not training. It serves already-trained models in response to active queries and omits the baseline training logic that general-purpose hardware carries. That’s the whole point of an ASIC — strip out the flexibility you don’t need and optimize hard for the workload you do. In Jalapeño’s case, that workload is chatbot responses, code generation, and interactive LLM applications.

The most surprising claim is the timeline. OpenAI says the chip was co-developed in roughly nine months, which is fast for a first-generation custom accelerator. The company credits its own AI models for accelerating the work, applying them to electronic design automation and hardware co-optimization.

Architecturally, OpenAI describes Jalapeño as heavily reducing data movement and balancing raw compute against high-speed internal memory and networking. The stated goal is to combine the throughput of today’s leading AI accelerators with latency closer to the fastest specialized inference systems, which matters most for real-time coding models and agentic products that need to respond fast and stay responsive under load. Detailed specs haven’t been disclosed, and perhaps won’t be unless OpenAI goes the TPU route and starts to sell its chip to others.

That’s not out of the question. OpenAI says Jalapeño is designed with flexibility to work with all LLMs across the industry, not just its own models. That’s a notable choice for a chip otherwise so tightly bound to OpenAI’s roadmap of models, kernels, and serving systems.

Compared to Nvidia’s Hopper and Blackwell, the distinction is clean. Those are general-purpose GPUs that handle both training and inference, while Jalapeño is a specialized inference ASIC and nothing more. It trades breadth for efficiency on a narrow set of tasks. Against Google’s TPU and Meta’s in-house accelerators, Jalapeño fits squarely into a now-familiar pattern — hyperscalers building bespoke silicon mapped to their own workloads rather than buying off the shelf. Broadcom already builds custom ASICs for Google and Meta, so OpenAI joining the list is less a surprise than a confirmation.

The performance leans heavily on efficiency. OpenAI says early testing shows Jalapeño delivering better performance per watt than current state-of-the-art AI accelerators on relevant workloads. Broadcom’s CEO has gone further, saying initial tests show roughly 50% lower cost and roughly 50% better cost efficiency versus standard AI GPUs.

Scale and deployment

Jalapeño isn’t being sold as a standalone chip. OpenAI, Broadcom, and Celestica are building it into finished server racks, boards, and networking stacks, delivering vertically integrated systems rather than loose silicon. That mirrors Broadcom’s broader strategy of selling rack-scale systems, which command higher revenue and tie the company more deeply into its customers’ deployments.

This is the hardware foundation for the much-reported 10 gigawatt custom AI computing buildout that OpenAI and Broadcom announced earlier. Under that collaboration, OpenAI designs the accelerators and systems while Broadcom leads development, manufacturing, and deployment. Jalapeño is the first generation of silicon meant to fill that pipeline.

The strategic logic for OpenAI is mostly about cost and control. Inference is a major share of what it costs to run ChatGPT and the API at scale, and custom silicon is the company’s bet on more predictable per-query economics. By shifting some of its inference footprint to Jalapeño, OpenAI reduces its exposure to Nvidia’s pricing and supply constraints. The catch is that it trades that dependence for a tighter dependence on Broadcom and its foundry partners.

Then there’s the 10 GW figure itself. That’s on the scale of large national grids. OpenAI and Broadcom argue that more efficient chips reduce total power growth versus a GPU-only approach, and per unit of work that’s plausible. But efficiency gains have a way of enabling bigger deployments rather than smaller power bills, and a buildout this size runs straight into grid capacity limits, local permitting bottlenecks, and the environmental questions that follow any datacenter expansion of this magnitude. The chip being efficient doesn’t make 10 gigawatts small.

A shot at Nvidia

For Broadcom, Jalapeño is another entry in a custom ASIC business that already counts Google, Meta, and ByteDance among its clients. The CEO has said the company has over $10 billion in orders for AI racks based on its XPUs, and OpenAI — long rumored to be the unnamed fourth client working on an inference chip once code-named “Titan” — is now effectively confirmed as part of that pipeline. Broadcom positions itself as the go-to foundry for hyperscaler-designed accelerators, and every named customer strengthens that pitch.

Plenty remains unknown. There’s no public word on the fabrication node, transistor counts, the HBM generation in use, or the finalized pricing model. Until those land alongside independent benchmarks, the efficiency claims sit firmly in the “promising, unverified” column.

There’s also a competition angle. Custom in-house silicon could deepen the moat for the largest AI players, raising the barrier to entry for smaller startups that can’t fund their own hardware programs. That cuts both ways. The same shift also chips away at Nvidia’s dominance, which regulators might read as increasing competition in AI hardware rather than reducing it.

Against Nvidia’s market position, Jalapeño validates Broadcom’s bet on integrated systems over generalized GPU adoption. The broader picture is a bifurcation that’s been forming for a while now. GPUs like Nvidia’s remain critical for frontier training, while inference increasingly moves to custom chips — TPUs, XPUs, and Jalapeño-class ASICs — inside hyperscale datacenters. Jalapeño is OpenAI’s most concrete move yet in that direction.

Created by RCR Wireless News. Telecom Industry editorial excellence since 1982

OpenAI and Broadcom unveil “Jalapeño,” a custom chip built only for AI inference

OpenAI’s new chip is already running GPT-5.3 workloads

Design and architecture

Scale and deployment

A shot at Nvidia

Join 37,000+ professionals receiving the AI Infrastructure Daily Newsletter

Created by RCR Wireless News. Telecom Industry editorial excellence since 1982

OpenAI and Broadcom unveil “Jalapeño,” a custom chip built only for AI inference

OpenAI’s new chip is already running GPT-5.3 workloads

Design and architecture

Scale and deployment

A shot at Nvidia

You may also like

Winners and losers of TSMC’s capacity crunch

AMD buys MEXT to make flash behave like...

TSMC’s next-gen CoPoS packaging is nearly ready, and Nvidia...

Google reportedly orders 3 million AI chips from...