AI traffic reshapes network behavior and assurance models

Home Programs AI traffic reshapes network behavior and assurance models
network assurance

Traditional network monitoring and assurance no longer provide a full picture of the user experience in the agentic AI world

The rapid proliferation of AI and agentic workloads across the network are introducing new traffic patterns and behaviors, rendering older network monitoring and assurance approaches inadequate. 

In an interview on Pulse, Tom Foottit, senior director of product management at Cisco, argued that AI workloads are inherently different from the workloads networks were originally designed to support.

Large language models (LLMs) and AI agents trigger hundreds and thousands of interactions every minute. “It’s not just that I’m asking a question to ChatGPT. I’m working with an AI agent to help build a product definition [for example], and that agent is working with other agents,” he said. 

Those constant exchanges create “a tree of interactions” that are real-time and non-deterministic. “It is interactive traffic,” Foottit said. “Like voice or video conferencing versus a streaming video where you can buffer and deal with that kind of thing.”

As organizations deploy more bespoke and private AI models, the number of moving parts inside the network is only going to multiply. 

How operators measure and understand these dynamic interactions and interdependencies are critical to getting a true picture of the user experience, Foottit argued. “[It] is very different than the ones that we were solving for 5, 10, 15 or 20 years ago.”

However, that is not to say that the techniques of yesteryear are irrelevant today.  “The techniques that we’ve used over the years are still valid,” Foottit emphasized. “But if you look at the history of service assurance, a lot of work was done at Layer 2 or 3 in the network.”

Measuring delay, packet loss, and jitter was sufficient when assuring the network for voice, video, and web services as those traffic could often be buffered, cached, or optimized through static delivery models. According to Foottit, those traditional network metrics still matter and are essential for benchmarking performance for AI training, inference, and agents. 

However, they are no longer sufficient by themselves to understand the user experience in AI-driven environments. So in addition to those, operators now must now analyze metrics like LLM request latency and inter-token latency that shine light on how network conditions are impacting model and agent performance. 

Another reason why assurance approaches need to be modernized is because many operators still measure network components insolation instead of focusing on understanding the end-user experience across the entire AI workflow. The complexity and unpredictability of AI traffic demand an assurance framework that is unified for the network, applications, and AI models, Foottit said. “The user’s experience is really an amalgamation of all of that,” he said.

He added that assurance capabilities must be hardwired into the infrastructure from the outset rather than planned and added after problems emerge. “You need to bake [them] into the design of the network versus coming along after the fact,” he said. This thinking becomes increasingly important as operators strive toward autonomous and self-healing networks. 

Now is a good time for prepping the assurance model for AI. “[AI traffic] is not something that’s dominating networks yet, like video and voice before it did for so many years. But it’s obviously on a ramp. And so it’s about looking at that and saying, there may be some new techniques, new things that we need to be able to do in order to understand the experience users are getting from a network when they’re interacting with AI.”

To companies begining the transition, Foottit said, “If you’re running AI workloads in the network, you want to use AI as well to be able to understand those workloads, how they’re behaving, and be able to deliver the best service to customers. You need to be able to build that measurement into the network.”

But before that, they must evaluate their current assurance strategy. “The biggest thing is to step back and say, am I using the same techniques that I used 10, 15, 20 years ago to measure network performance? Or am I starting with a clear view and clean eyes and looking at this and saying, what makes sense to measure now?”

What you need to know in 5 minutes

Join 37,000+ professionals receiving the AI Infrastructure Daily Newsletter

This field is for validation purposes and should be left unchanged.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More