Table of Contents
The collaboration scales Google Cloud infrastructure with Vera Rubin GPUs
In sum – what we know:
- Massive hardware scaling – New A5X instances featuring Nvidia Vera Rubin GPUs can scale to nearly one million GPUs in multisite clusters using Google’s Virgo networking.
- Targeting agentic workflows – The partnership integrates Nvidia Nemotron models and NeMo tooling into Google Cloud to help developers build agents that reason and execute complex tasks.
- Flexible deployment options – Updates include fractional G4 VMs for cost-efficient right-sizing and Gemini on Google Distributed Cloud for sensitive on-prem or edge AI requirements.
Nvidia and Google Cloud have been working together for some time now, but recently they announced an expansion of that collaboration. The announcement takes aim squarely at the infrastructure and software stack needed to push agentic and physical AI workloads into production environments.
As Mark Lohmeyer, VP & GM of AI and Computing Infrastructure at Google Cloud, put it, “by combining Google Cloud’s scalable infrastructure and managed AI services with NVIDIA’s industry-leading platforms, systems and software, we’re giving customers flexibility to train, tune and serve everything from frontier and open models to agentic and physical AI workloads.”
In other words, the goal is to make moving from prototype to production less painful than it is right now, both for enterprises and for smaller startups. That said, it remains to be seen how fast the demand for physical AI workloads actually ramps up.
Hardware infrastructure updates
The biggest hardware news is A5X bare-metal instances running on Nvidia Vera Rubin GPUs. According to the companies, single-site clusters can scale to 80,000 Rubin GPUs, while multisite clusters stretch to 960,000. Nvidia ConnectX-9 SuperNICs paired with Google’s next-gen Virgo networking form the data fabric connecting all of it.
On the other end of the spectrum, fractional G4 VMs powered by Nvidia RTX PRO 6000 GPUs address a long-standing annoyance in cloud GPU economics. Not every workload needs a full GPU, and often that means paying for capacity you’re not using. Being able to right-size GPU allocation is helpful for smaller or bursty tasks where cost efficiency matters.
For teams doing cutting-edge model research, A4X Max VMs with Nvidia GB300 GPUs deliver accelerated performance specifically tuned for training workloads. There’s also now Confidential VM support for Nvidia Blackwell GPUs, which brings hardware-level security to sensitive AI workloads.
And then there’s Gemini on Google Distributed Cloud, now in preview on Nvidia Blackwell and Blackwell Ultra GPUs. This pushes Google’s Gemini models toward the edge and into on-prem environments, which is important for organizations that want AI capabilities but either can’t or won’t route their data through the public cloud.
Software and service expansions
The software centers on the Gemini Enterprise Agent Platform picking up new agentic capabilities via Nvidia’s Nemotron open models and the NeMo framework. The goal is to give developers a more complete set of tools for building AI agents that reason, plan, and execute across complex workflows, rather than simply answering questions. NeMo Framework provides open-source tooling for end-to-end agentic workflows, which should make it easier for teams to experiment with agent architectures without having to build the entire stack themselves.
Vertex AI is getting upgrades too, with training clusters now specifically optimized for complex reasoning workloads. That’s relevant for anyone fine-tuning large models or training agents that need to reliably handle multi-step tasks. Google Cloud’s AI Hypercomputer platform is expanding alongside it, with particular emphasis on AI factories and physical AI applications.
Enterprise and startup use cases
In the announcement, Nvidia and Google highlighted some use-cases for the new tech. Snap has cut costs on large-scale A/B testing by moving its data pipelines to GPU-accelerated Spark on Google Cloud. Schrödinger, on the other hand, has compressed drug discovery simulations that used to take weeks down to hours using Nvidia accelerated computing on Google Cloud.
On the research side, Thinking Machines Lab is using the AI Hypercomputer for frontier model training and platform development. Among emerging startups, Inferact is building a vLLM inference engine on a cluster of Nvidia GB200s, Neuraldefend is developing real-time deepfake detection on H100 and L4 GPUs, and Aible is deploying Nemotron 3 and NemoClaw agents on BigQuery through serverless RTX-accelerated Cloud Run. These startup examples speak to the breadth of what’s being built on the joint infrastructure, even though many of these companies are still relatively early-stage.
Some of the partnership’s momentum shows up in community and recognition numbers. Over 90,000 developers have joined the joint Nvidia and Google Cloud developer community in just over a year. Nvidia also picked up Google Cloud Partner of the Year awards for both AI Global Technology and Infra Modernization Compute. Those awards are somewhat ceremonial, sure, but they do signal just how central this relationship has become to Google Cloud’s overall AI strategy.