In large-scale AI training, computing clusters often reach only 30–50% of their theoretical performance because GPUs sit idle while waiting to communicate with one another. In fact, communication and synchronization bottlenecks in massive GPU clusters can cost data center operators hundreds-of-thousands to millions-of-dollars per day.
Monday, RCRTech will break down highlights from a recent AI TechTalk with CEO Suresh Vasudevan of Clockwork Systems – an AMD- and Broadcom-backed company that is attracting attention from neoclouds, large enterprises, hyperscalers and anyhone deploying AI workloads on tens-of-thousands and even hundreds-of-thousands of GPUs. According to Vasudevan, “a 1,000 GPU cluster can typically have two to four disruptive events on a daily basis, bringing losses of $5 million -$8 million out of about $50 million spent on that size cluster.”
Check back in to see how software-driven solutions can bring nanosecond-level time synchronization across server clocks to optimize communication among GPUs and raise utilization of a GPU cluster in both training and inference workloads.

Susana Schwartz
Technology Editor
RCRTech
AI Infrastructure Top Stories
Idle GPUs cost millions: Large-scale computing clusters often reach only 30–50% of their theoretical performance, with a 1,000-GPU cluster typically seeing 2- 4 disruptive events/day. Clockwork.io CEO Suresh Vasudevan digs into the issue.
APAC as DC growth engine: According to McKinsey, traditional compute, storage, and cloud workloads currently account for more than 70% of APAC data center demand, while AI training and inference workloads represent roughly 30%.
AI Today: What You Need to Know
IBM Sub-1nm architecture: IBM touts energy savings in the debut of the world’s 1st sub-1 nanometer chip. Utilizing a “nanostack” 3D transistor architecture at the 0.7 nm node, it crams 100 billion transistors onto a fingernail-sized piece of silicon.
PA bill might end DC tax credit: The Pennsylvania House of Representatives passed House Bill 2198 with a 197-5 vote. If it also passes the Senate, it’ll eliminate a 2021 policy to exempt data centers from paying state sales tax on DC equipment.
Micron’s AI-driven ascent: Micron Technology briefly bypassed Meta and Tesla in market valuation after a blockbuster $22 billion in customer commitments for its memory chips, highlighting intense infrastructure demand.
RCR Events
Quantum Safe Networks Forum, July 14th
Quantum Safe Networks Forum brings together telecom operators, cybersecurity experts, and industry analysts to explore how to build resilient, future-ready infrastructure in the face of quantum disruption. Register now
RCR Roundtables AI Infrastructure, October 21st, Dallas, Texas
Join 50 senior data center, energy and AI leaders at the Ritz-Carlton Dallas on October 21 for invitation-only roundtables on powering and scaling AI. Request your invitation
Industry Resources
Webinar, June 29th: Agentic RAN Management: Delivering OPEX efficiency and a path to 6G
Webinar, June 30th: Building the 6G Standard: Key developments to know
Webinar, July 7th: Noise-Figure Measurements with RFmx and PXI VSTs
Webinar, July 16th: NTN in motion — evolving standards, expanding services
Whitepaper: Powering sovereign AI at scale
Whitepaper: Scalable database design for 5G and beyond
Report: Scaling AIOPs from insight to action
Summit Access: GSMA Device Enablement Summit: How operators can fix device-network fragmentation
Whitepaper: Telco AI Enabler: Mediation’s defining role
Report: Securing telecom infrastructure for the quantum era
Report: Scaling optical networks for the AI and hyperscale era