Table of Contents
Edge computing isn’t dead. It was just waiting for the right moment to shine, and that moment has arrived with AI
A growing number of AI inference and physical AI use cases are demanding compute capabilities at the far reaches of the network. Roy Chua, principal analyst of AvidThink, says that even though edge computing may have not made it outside a handful of niche deployments the last time, the new wave of ultra-low-latency applications driven by AI may change the narrative this time around.
In the latest episode of Pulse, Chua say that, however, the new compute model that’s emerging from this shift is not an all-cloud vs all-edge play, but a layered approach in which different parts of the inference pipeline are executed in different locations.
Specifically, Chua highlighted two trends crystalizing around edge AI. First, he said, companies are borrowing from the earlier placement logic and carrying it over to AI. “With computer vision workloads for example, some of the precomputing may happen at a remote site — either on camera or locally — and then some of the more complicated operations get passed up to the edge, and then eventually, some get sent to a centralized data center for long-term storage. It’s still the same in the LLM [large language model] era,” he says.
With inference workloads, he says, “doing some of the embeddings, the filtering up front, and then sending that up where you do some of the other elements of inference, like prefill and reranking at the edge,” would be a more practical way to split the load. Some of the heavy reasoning work that takes more compute can then be put in a centralized location for access to on-demand compute and storage.
Apple is a good case study of this. Apple’s workloads are distributed between devices, data centers, and partners like Google and OpenAI whom it relies on for handling heavy-duty, complex processing. This disaggregation of workloads across the pipeline will become a dominant feature in the era of edge AI inference.
Second, small language models (SLMs) that perform certain agentic functions or simple reasoning in constrained domains will be pushed outward to the edge for their lighter compute and more stringent latency requirements, Chua says.
Beyond that, data sovereignty rules will dictate more broadly to which geographies workloads belong, as governments place tighter controls on where data can reside and how it can be processed.
However, computing at the edge has its challenges too that often get overlooked. Key among them are operational overhead, orchestration complexity, need for edge-compatible DevOps toolsets, and model drift stemming from a distributed architecture. “Fundamentally, when you try to manage models and data over the whole lifecycle, the edge is just a bit more complicated,” Chua says.
Additionally he notes that operators must be mindful of the fact that the edge does not offer the elasticity of the cloud which may make it unsuitable for certain types of workloads.
Telcos have been bullish on edge computing from the begining. They’ve set up infrastructure that are now capable of running AI workloads natively, and formed partnerships with big hyperscalers, helping them advance their edge plays. But their margins have remained slim.
“The uptake wasn’t what we all collectively expected,” Chua admits, but he adds “Telcos still believe that there is play here. They believe that some level of sovereignty and ownership of what we call ‘beachfront assets’ so close to the users have the advantage, and the fiber networks that they own in some cases, low-latency, high-capacity access should bring them value,” .
At the same time, there are constraints to acknowledge. Limited compute capacity at the edge and varying demands of AI inference workloads will require telcos to reassess their situation before going back into the game. There are many open questions at this point: what kind of workloads fit at the edge, how to effectively manage AI models at the edge, and how to introduce oneself in that ecosystem. “Those challenges existed before and they still exist today,” he sys.
Collaborative AI agents further add a layer of complexity to the distributed inference fabric. “I’m still not entirely convinced that we figured out how to make the agents collaborate correctly, or put the guardrails in place, but it is happening,” Chua says.
What is likely to happen initially is a level of discovery between local agents, which could lead to interesting things like a drone and a warehouse robot agent talking to each other over Agent2Agent or another protocol, but it will all depend on how tightly telcos are able to control latencies in their networks to make multi-step inferencing possible at scale in real time, he says.