O2 Telefónica deploys agentic AI for service-centric telco operations
A new Large Telco Model in Germany moves autonomous network ambitions from slideware to the service desk, with clear implications for how CSPs run network, IT, and customer operations.
Deployment details: Large Telco Model powering the Service Experience Center
O2 Telefonica in Germany is deploying a Large Telco Model to power its Service Experience Center, part of its broader Operations of the Future program. The initiative, built with Tech Mahindra and NVIDIA, targets end-to-end service centricity across network, IT, and customer domains by fusing telemetry, tickets, playbooks, and service topology into an AI-driven operating fabric. Tech Mahindra first previewed its LTM at NVIDIA GTC in March 2025; the model is being trained and served on NVIDIA AI Enterprise software and tailored for telecom with NVIDIA NeMo and NVIDIA NIM microservices to accelerate customization and scaled inference.
Why now: 5G complexity, opex pressure, and AI-native operations
Cloud-native 5G cores, multi-vendor RAN, fiber densification, and edge workloads have pushed operational complexity beyond what rule-based automations and static runbooks can sustain. At the same time, opex pressure, SLA tightness for B2B, and a shortage of experienced NOC talent demand faster root-cause isolation, fewer truck rolls, and higher first-touch resolution. By operationalizing agentic AI inside its Service Experience Center, O2 Telefnica signals a shift from domain-centric, device-level management to intent-driven, service-centric autonomy that is becoming a hallmark of AI-native telcos.
Inside the Large O2 Telefónica AI Telco Model: architecture for service-centric operations
The LTM blends generative and predictive AI with domain knowledge to translate service intent into safe, closed-loop actions.
Architecture and data: service graph, NeMo, NIM, and integrations
The stack relies on NVIDIA AI Enterprise for lifecycle management, with NeMo to adapt foundation models and NIM microservices to deploy specialized capabilities at scale. It ingests structured network data such as events, alarms, counters, and KPIs, and unstructured inputs like logs, methods of procedure, standard operating procedures, images, field notes, and even contextual business data. A live service graph ties these signals to subscribers, locations, and products, enabling the model to reason over cause and effect at the service level. Integration with orchestration, assurance, and field-force systems enables closed-loop actions through APIs, while policy and guardrails keep the human-in-the-loop where needed.
Priority use cases: RCA, dispatch optimization, experience management
Initial value concentrates on repeatable, high-friction workflows. Automated root-cause analysis combines anomaly detection with generative explanations, cutting mean time to identify faults. Dispatch optimization weighs technician skills, parts availability, and site access to minimize truck rolls. Service experience management correlates customer-impacting events with network changes and third-party dependencies to prioritize remediation by business impact. Additional patterns include proactive incident prevention, change-risk prediction, capacity hot-spotting, and intent translation that turns plain-language objectives into executable workflows.
Agentic operations beyond copilots: TM Forum AN Levels 3–4 with guardrails
Unlike passive assistants, agentic systems can plan, call tools, and iterate toward a goal under supervision. The LTM approach advances toward TM Forum Autonomous Networks Levels 34, where systems can coordinate across domains with policy-based oversight. It aligns with industry work in ETSI ENI and extends beyond 3GPP SON by spanning RAN, transport, core, IT, and customer ops. Safety is enforced through role-based access, approval gates, and evaluation pipelines that score actions for drift, bias, and explainability before execution.
Strategic implications for CSPs: outcomes, KPIs, and operating model
This deployment is less about a single model and more about a new operating model that blends AI, service topology, and automation with measurable outcomes.
Expected benefits and KPIs for AI-driven telco operations
CSPs should benchmark gains in mean time to detect and repair, percentage of incidents auto-triaged, reduction in truck rolls, SLA compliance for enterprise services, and customer experience metrics like NPS or CES. Secondary impacts include lower energy consumption via smarter load placement and fewer unnecessary site visits. The business case strengthens as models move from suggestion to supervised action, unlocking compounding opex and experience gains.
Key challenges: data readiness, integration, safety, and GPU cost
Data readiness is the first hurdle, including telemetry quality, lineage, and a consistent service inventory. Generative models introduce risks of hallucination, which must be controlled with retrieval-augmented generation, strong grounding, and action simulators. Integration maturity is critical; brittle OSS/BSS APIs limit closed-loop scope. GPU capacity planning and cost governance matter as inference scales. Operators should also weigh portability to avoid lock-in, enforce privacy by design for customer data, and meet audit and explainability requirements for regulated services.
Ecosystem and standards: TM Forum Open APIs, GSMA, ETSI ENI
Adhering to TM Forum Open APIs and ODA components can speed integration and portability across vendors. Interworking with hyperscaler platforms and on-prem GPU clusters allows flexible deployment for latency-sensitive domains. Alignment with GSMA and NGMN guidance on AI-native telcos, and contribution to shared evaluation benchmarks will help the industry converge on safe, interoperable autonomous operations.
Roadmap to Agentic AI: Practical Steps for AI-Driven Telco Transformation
Operators and large enterprises should sequence adoption with near-term wins, strong guardrails, and a path to scale.
Assess and prioritize high-value AI operations use cases
Start with a data and topology audit to map sources, quality, and access. Select a small portfolio of high-value use cases such as RCA, change-risk prediction, and dispatch optimization, and baseline metrics for MTTR, FTR, SLA breaches, and truck rolls. Decide deployment patterns early, balancing on-prem GPU efficiency for sensitive workloads with cloud elasticity for experimentation.
Build the AI operations platform: RAG, agents, MLOps/AIOps
Stand up an AI operations stack with model cataloging, feature and vector stores, retrieval-augmented generation, and agent orchestration. NVIDIA AI Enterprise with NeMo and NIM can accelerate model customization and serving, while MLOps and AIOps pipelines handle data ingestion, evaluation, rollback, and drift management. Ensure tight integration with assurance, orchestration, and ticketing to enable closed-loop control.
Operationalize safely with HITL, approvals, and testing
Adopt human-in-the-loop patterns with approval gates and well-defined blast radiuses. Use A/B testing and canary releases for automated actions. Build red-team and safety evaluation suites specific to telecom operations. Update change management and incident playbooks for AI-driven workflows, and invest in upskilling engineers on prompt design, policy authoring, and agent governance.
Measure and iterate toward higher autonomy
Track outcome KPIs, unit costs per incident, and customer impact, not just model metrics. Expand across domains as confidence grows, from NOC to IT ops to customer care, reusing the service graph as the common backbone. Feed learnings back into model fine-tuning and policy updates to move progressively toward higher levels of autonomy.
O2 Telefonica’s move with Tech Mahindra and NVIDIA is a credible template for AI-native operations: service-centric, model-driven, and measurable, with a path from copilots to supervised agents and, over time, to safer autonomy.