The AMD MI350P slots into existing server racks

Table of Contents

The 600W MI350P is essentially a halved MI350X

In sum – what we know:

Drop-in form factor – The MI350P is a dual-slot, air-cooled PCIe Gen5 card built to slot into existing 19-inch servers without new liquid cooling.
Half an MI350X – It roughly halves the compute, memory, and power of AMD’s flagship, landing at 128 CUs, 144 GB HBM3E, and a 600 W board power.
Inference, not training – Broad FP8/FP4 support and up to 4.6 PFLOPS target LLM and agentic AI inference, with PCIe-only interconnect ruling out large-scale distributed training.

AMD has unveiled its latest data center chip, and while it’s not a massive generational upgrade, it does slot in nicely into the company’s AI chip roadmap. The AMD Instinct MI350P PCIe GPU is a dual-slot, air-cooled CDNA 4 accelerator built to drop into standard PCIe Gen5 x16 servers. The idea, essentially, is that AMD wants to bring generative and agentic AI inference to enterprise data centers that already exist, rather than to the purpose-built liquid-cooled halls that have dominated the conversation for the past couple of years.

The framing here is “enterprise AI where you are.” The MI350P targets on-premise deployments that don’t want to rebuild or re-rack to accommodate ultra-dense AI systems, and it’s the first PCIe-based Instinct accelerator since the MI210 back in 2022. After several years focused on high-power OAM and SXM modules like the MI300 and MI350X, AMD is returning to a slottable form factor.

The simplest way to understand the card is as an MI350X cut in half. AMD has taken its top-end accelerator and roughly halved the compute, memory, and power to fit a 600 W PCIe footprint. That’s a deliberate tradeoff, and it tells you exactly who this is for. Expect MI350P systems to arrive via OEM partners, with AMD also planning standalone card sales for upgrading existing servers.

Form factor and physical specifications

The MI350P is a dual-slot, full-height, full-length PCIe CEM card designed for mainstream 19-inch server chassis. It’s passively cooled and relies entirely on the host server’s airflow, with no direct-attach liquid or cold-plate requirement. That’s the whole point. If your servers can already move enough air, you don’t need to touch your cooling infrastructure.

It uses a PCIe Gen5 x16 interface and carries a Typical Board Power of 600 W, with a configurable 450 W option for more constrained environments. Power comes in via a standard 16-pin 12VHPWR connector, which is now common on high-power GPUs. In practice, the card is meant to act as a drop-in upgrade for many existing 1U and 2U server GPU bays, assuming the host has sufficient power delivery and airflow. 600 W is still a lot to feed and cool, so “drop-in” comes with caveats, but it’s a far easier ask than the 800 W to 1000 W liquid-cooled modules at the high end of the market.

Architecture and memory

Underneath, the MI350P runs the same AMD CDNA 4 architecture as the MI350X. It packs 128 Compute Units, which lines up with the half-an-MI350X design, though AMD’s public messaging leans on AI TFLOPS rather than raw core counts.

On the memory side, you get 144 GB of HBM3E organized across four stacks, delivering up to 4.0 TB/s of aggregate bandwidth. That’s roughly half the 288 GB on an MI350X, and it’s a meaningful number for inference. 144 GB is enough to keep substantial LLMs resident on a single GPU, anywhere from tens to low hundreds of billions of parameters depending on quantization, with multi-GPU sharding available for larger models. The card also supports GPU virtualization and partitioning through AMD’s SR-IOV-based stack, allowing up to four partitions per GPU. That makes it possible to share a single card across multiple smaller LLMs or microservices, which is genuinely useful in multi-tenant on-prem clusters.

Performance and supported formats

AMD’s headline figure is up to 4,600 TFLOPS, or 4.6 PFLOPS, of AI compute at MXFP4, the FP4-class mixed-precision format. AMD claims this is the highest AI performance currently available in an enterprise PCIe card. Step up to higher-precision formats like FP8 and MXFP8 and you’re looking at up to 2,299 TFLOPS, depending on mode.

Format support is broad. The card handles FP8, MXFP8, FP16/BF16, INT8, MXFP4/FP4, and FP32, with 2:4 structured sparsity acceleration for many of the low-precision formats. FP8, FP4, and INT8 are exactly the formats that matter most for modern LLM and generative AI inference, and that’s no accident. The capacity and the supported formats are tuned for single-GPU inference of large models, not for the highest-end training jobs.

Deployment and interconnectivity

You can run anywhere from one to eight MI350P GPUs per server, with the actual ceiling set by available PCIe slots, power budget, and cooling. The catch is how those GPUs talk to each other. There’s no dedicated high-speed GPU-to-GPU fabric here, no on-card Infinity Fabric links and no NVLink equivalent. Inter-GPU communication runs over standard PCIe Gen5, which caps peer bandwidth at around 128 GB/s per x16 link.

That’s a real limitation, though. For large-scale distributed training or model-parallel workloads that depend on extremely high GPU-to-GPU bandwidth, PCIe will be a bottleneck. But for the workloads AMD is actually targeting, that’s probably a non-issue. Inference, RAG, and most agentic AI pipelines either keep the model mostly within a single GPU’s HBM or use limited cross-GPU communication, and in those cases PCIe bandwidth is fully adequate. This is a card built around its strengths, not one pretending to do everything.

Target workloads

The MI350P is squarely an inference product. AMD positions it for generative AI inference across LLMs, image and audio generation, and code models, rather than primary training of the very largest models. Alongside that, it’s aimed at agentic AI pipelines, the multi-step, tool-using agents and workflow orchestration that combine LLMs, vector databases, and back-end tools. RAG pipelines and search applications are another headline use case, as is classical ML and analytics acceleration for things like ETL and embeddings.

The underlying argument is that CPU-only infrastructure is increasingly insufficient for these workloads. The MI350P is meant for organizations that need more AI compute than CPUs can reasonably provide but aren’t ready to invest in fully bespoke AI supernodes. That’s a sizeable middle ground.

Software and ecosystem

On the software side, the MI350P plugs into AMD’s open, enterprise-ready AI stack built around the ROCm 6 family. It’s compatible with PyTorch and other mainstream frameworks, supports the Kubernetes GPU Operator for containerized deployments, and is backed by AMD Inference Microservices that provide pre-built containers and orchestration tooling for LLMs, RAG, and other services.

AMD is leaning hard on the “open ecosystem” angle. There are no per-GPU software license fees, reference stacks are free and open-source where possible, and the stated goal is to lower TCO and reduce vendor lock-in compared to more proprietary stacks. That’s a genuine selling point, particularly for enterprises wary of getting boxed in.

That said, AMD’s software ecosystem has historically lagged NVIDIA’s CUDA in maturity and tooling. Enterprises comfortable with CUDA may need to invest in re-tooling, or lean on translation layers and vendor-provided containers, and how quickly ROCm-based stacks for vLLM and popular inference servers mature is the single biggest factor to watch as the MI350P rolls out. The hardware can be excellent and still lose deals on ecosystem friction.

The bottom line

Compared to the MI350X, the MI350P gives up half the compute and memory, runs at lower power, and drops the external high-speed fabrics, but in exchange you get a form factor that actually fits into the servers enterprises already own. Compared to the MI210, the previous PCIe Instinct from 2022, it’s a generational leap. You’re getting modern AI formats like FP8 and FP4, a large jump in HBM capacity and bandwidth, and multi-PFLOP performance aligned specifically with LLMs and generative AI.

The competition is the obvious one. The MI350P lands in a market dominated by NVIDIA’s PCIe and SXM accelerators, the H100 and H200 series and their successors, with Intel’s Gaudi line also in the mix. AMD talks up leadership cost-performance, but official pricing for the MI350P hasn’t been detailed, and real-world cost-performance will lean heavily on OEM bundles and channel deals.

Created by RCR Wireless News. Telecom Industry editorial excellence since 1982

AMD’s new MI350P brings PFLOP-class AI inference to the servers you already own

The 600W MI350P is essentially a halved MI350X

Form factor and physical specifications

Architecture and memory

Performance and supported formats

Deployment and interconnectivity

Target workloads

Software and ecosystem

The bottom line

Join 37,000+ professionals receiving the AI Infrastructure Daily Newsletter

Created by RCR Wireless News. Telecom Industry editorial excellence since 1982

AMD’s new MI350P brings PFLOP-class AI inference to the servers you already own

The 600W MI350P is essentially a halved MI350X

Form factor and physical specifications

Architecture and memory

Performance and supported formats

Deployment and interconnectivity

Target workloads

Software and ecosystem

The bottom line

You may also like

Google’s Frozen v2 chip could make Gemini dramatically...

Etched AI in talks to quadruple valuation to...

Intel Foundry’s 18A yields reportedly surge as Nvidia,...

Tower Semiconductor bets $3 billion on the optical...