Your AI infrastructure is as good as your DNS strategy, and the resiliency of both cloud and network infrastructure
As the AI ecosystem expands and becomes more complex, it will be increasingly important to consider DNS a mission-critical component of AI infrastructure. As evidenced by this week’s AWS outage, which affected thousands of companies and millions of customers, DNS is central to the performance, security, and seamless connectivity of data sources, AI models, and multiple cloud environments.
Check out the steps both cloud providers and their customers can take to mitigate the impacts of DNS failures – a worthwhile investment of time and resources based on the massive losses of high-profile outages this year.
The cost of the AWS outage is projected to range from $75 million per hour in direct (collective) losses to hundreds of billions for the entire global ripple-effect. It’s hard to believe single points of failure can have such far-reaching impacts, but AWS, Cloudflare, Google Cloud, and Oracle Cloud are just some examples of how vulnerable even the biggest companies are. “It demonstrates both how far we’ve come and where we still need to focus,” CEO Dolores Saiz of cloud consultancy The Server Labs told RCR. “Today’s cloud platforms enable dramatically faster recovery times, but only if businesses have architected for resilience from the start.”
Not only is resiliency in cloud platforms going to be increasingly important, but so, too, will resiliency in the modern network infrastructures. With AI going into every layer of network infrastructure, there will be a need for continuous evaluation in lab and live environments. The testing capabilities that will be particularly sought out for complex AI-enriched networks will be Network Digital Twin, Synthetic Test, and Continuous and Active Testing – all of which will help network service providers optimize network performance, service delivery and customer experience.
Susana Schwartz
Technology Editor
RCRTech
AI Infrastructure Top Stories
DNS strategy is mission critical: This week’s outage may accelerate improvements at major cloud providers, and individual companies – here we outline important key areas of weakness to strengthen.
Infrastructure under strain: The Vodafone outage, as well as that of AWS, shows that Industry 4.0 can fail without proper cloud and network redundancy. Some data has to stay on-site, and independent edge architectures are still important.
Testing AI-enhanced networks: Network digital twin, synthetic tests, continuous and active testing are becoming crucial to optimizing network performance and unlocking opportunities for improved service delivery.
AI-Powered Telecom Infrastructure
Supermicro, in collaboration with NVIDIA, delivers AI-powered infrastructure tailored for telcos, enhancing operational efficiency, network management, and customer experiences. Explore now
AI Today: What You Need to Know
AI model revenues: DeepSeek was the only AI model to generate a positive return yesterday, despite having the smallest development budget among its peers.
New data center: Port Washington chosen for $15 billion Stargate project in which OpenAI, Oracle, Vantage Data Centers will develop a data center campus.
Avoiding project fails: Agentic AI projects fail if business leaders deploy agentic AI indiscriminately. HBR says to focus on use cases with measurable business value.
AI infrastructure funding: 1001 raised $9 million to build AI infrastructure for critical sectors. Funding round was led by CIV, General Catalyst and Lux.
Revenue growth: TSMC, the world’s largest contract chipmaker, reported record results in the Q3 2025, with a 39% jump to NT$452.3 billion ($14.77 billion).
Network needs for inferencing: AI inferencing drives new network requirements, with inferencing use cases raising the bar for AI’s supporting infrastructure.
Data center cooling: Johnson Controls announces investment in data center liquid cooling company Accelsius, a leader in two-phase, direct-to-chip liquid cooling tech.