Home » Huawei Kunpeng CPUs challenge Nvidia in AI chips

Huawei Kunpeng CPUs challenge Nvidia in AI chips

Hema Kadia
Last Updated: September 23, 2025

The CPU roadmap is strategically important because AI clusters depend on balanced CPU-GPU ratios and fast data pipelines that keep accelerators fed and utilized. Even as GPUs carry training and inference, CPUs govern input pipelines, feature engineering, storage I/O, service meshes, and containerized microservices that wrap models in production. More cores and threads at competitive power envelopes reduce bottlenecks around feeder tasks, scheduling, and data staging, improving accelerator utilization and lowering total cost per token or inference. In this lens, a 256-core Arm-based Kunpeng in 2028 would directly affect how much AI throughput Ascend accelerators can sustain per rack.

AI, Security
AMD, GPU, Huawei, Intel, Investment, Nvidia, Oracle

Huawei Kunpeng CPU roadmap expands AI compute stack

Huawei outlined a multi-year server CPU plan that aligns general-purpose compute with its Ascend AI accelerators to offer a full-stack alternative to incumbent platforms.

From Kunpeng 950 in 2026 to 256 cores by 2028

At its Connect 2025 event in Shanghai, Huawei detailed two Kunpeng 950 variants slated for early 2026: a 96-core/192-thread model and a 192-core/384-thread model aimed at dense, scale-out deployments.

The company then set a 2028 target for the next step, a high-density Kunpeng part with at least 256 cores and 512 threads, alongside a second model tuned for stronger single-thread performance for AI-adjacent and database workloads.

While Huawei did not brand the 2028 chip on stage, ecosystem benchmarks referencing “Kunpeng 960” suggest that silicon and platform validation may be progressing, with Huawei also committing to continued microarchitecture and packaging improvements.

TaiShan 950 SuperPoD for mainframe-class database workloads

The Kunpeng 950 will power a TaiShan 950 SuperPoD architecture that scales to 16 nodes and up to 48 TB of memory across the pod to consolidate transactional and analytical systems.

Positioned as a replacement path for legacy mainframes and midrange systems still prevalent in financial services, the SuperPoD is designed to run distributed databases, core banking, and data warehousing with high concurrency.

With Huawei’s GaussDB distributed database, the company claims the SuperPoD can deliver notable performance gains without rewriting existing applications, putting it in the conversation with engineered systems such as Oracle Exadata.

Open-source database benchmarks show linear scaling

Benchmarks using MogDB, an open-source database derived from openGauss, indicate linear scaling characteristics in large core-count scenarios on a 256-core Kunpeng setup.

Using memory-optimized tables and 768 concurrent connections, the reported system reached approximately 4.8 million transactions per minute, pointing to the CPU’s potential for high-throughput OLTP and hybrid transactional-analytical processing.

For architects, the takeaway is less about an absolute score and more about concurrency behavior, memory bandwidth utilization, and how the platform sustains throughput as connections and threads rise.

Why Kunpeng matters for AI infrastructure and Nvidia’s moat

The CPU roadmap is strategically important because AI clusters depend on balanced CPU-GPU ratios and fast data pipelines that keep accelerators fed and utilized.

CPUs drive data ingestion, preprocessing, and orchestration

Even as GPUs carry training and inference, CPUs govern input pipelines, feature engineering, storage I/O, service meshes, and containerized microservices that wrap models in production.

More cores and threads at competitive power envelopes reduce bottlenecks around feeder tasks, scheduling, and data staging, improving accelerator utilization and lowering total cost per token or inference.

In this lens, a 256-core Arm-based Kunpeng in 2028 would directly affect how much AI throughput Ascend accelerators can sustain per rack.

Huawei is building a full-stack AI alternative

Huawei already ships Ascend AI accelerators and Atlas systems, and pairs them with its software stack (CANN for kernels, MindSpore for frameworks) plus operating systems and databases tuned for its silicon.

Adding high-core-count Kunpeng CPUs tightens vertical integration across CPU, accelerator, interconnect, and data platforms, similar in strategy to how Nvidia couples Grace CPUs with Hopper and GB200 platforms.

For buyers facing constrained access to Nvidia GPUs or seeking supplier diversity, a cohesive Huawei stack offers a second route to scale AI training and inference clusters.

Bridging the software gap is the toughest challenge

Nvidia’s CUDA ecosystem, extensive libraries, and ISV certifications are a durable moat that competitors must bridge with tooling, performance parity, and developer experience.

Huawei’s progress will hinge on robust compilers, graph optimizers, operator coverage, and compatibility layers for popular frameworks, along with enterprise-grade database and middleware integrations.

Success here would make the Kunpeng-plus-Ascend combination more than a hardware story; it would become an operational platform CIOs can standardize on across data, AI, and core IT.

Competitive context: Arm servers, FSI, and China’s supply strategy

The move fits a broader industry shift toward Arm-based servers, sector-specific modernization, and supply chain diversification, especially in China.

Arm server momentum versus x86 incumbency

Arm-based CPUs have gained traction in hyperscale and cloud with designs focused on core density and power efficiency, challenging established x86 roadmaps from Intel and AMD in scale-out computing.

Kunpeng’s trajectory into 192 and 256 cores aligns with this trend, targeting throughput-oriented services, microservices sprawl, and data platforms that benefit from many cores and memory bandwidth.

Competing will require not just core counts but also mature NUMA behavior, PCIe bandwidth for accelerators and storage, and ecosystem support across Linux distributions and hypervisors.

Financial services modernization and data gravity

Banking and insurance still run critical workloads on mainframes and proprietary appliances, creating cost and agility constraints as real-time analytics and AI inference move closer to data.

By pitching TaiShan SuperPoD as a consolidation target for OLTP databases and warehouses, Huawei is attacking a large replacement market with clear TCO and modernization narratives.

Integration with GaussDB and openGauss-compatible stacks also lowers migration friction versus rewriting core systems—a key hurdle in FSI transformations.

Sanctions, supply, and domestic ecosystems

Export controls have reshaped silicon choices in China, accelerating investment into domestic CPUs, AI accelerators, and software stacks to mitigate dependency risks.

Huawei’s end-to-end approach—CPU, AI accelerator, servers, storage, and databases—aims to stabilize supply and performance trajectories for local buyers while creating an ecosystem alternative.

The result is a regional market that may standardize on Huawei platforms faster than global markets, even as international buyers evaluate multivendor strategies.

What to watch and next steps

Stakeholders should track silicon milestones, software readiness, and third-party validations while running targeted pilots to de-risk adoption.

Guidance for telecom and cloud providers

Model total rack performance by pairing Kunpeng CPU assumptions with Ascend accelerator nodes and realistic data pipeline loads, including storage tiering and network overhead.

Evaluate interoperability with Kubernetes, service meshes, and CNFs/VNFs, and test multi-tenant isolation and noisy neighbor effects under mixed AI and network workloads at the edge and core.

Assess supply timelines, spares, and local support models to ensure lifecycle continuity for large-scale deployments.

Guidance for enterprises and financial architects

Identify mainframe or engineered database candidates for migration pilots using GaussDB or openGauss derivatives on TaiShan SuperPoD, with clear success metrics on throughput, latency, and operability.

Benchmark end-to-end data-to-inference pipelines to validate that CPU-side preprocessing and scheduling keep AI accelerators saturated at target utilization levels.

Plan for skills and tooling: compilers, observability, and MLOps integration are as critical as raw core counts.

Metrics and milestones to watch

Watch for independent Kunpeng 950 and 256-core platform reviews, SPEC-style results, and real-world database benchmarks beyond vendor labs.

Track software maturity: operator coverage in CANN, framework compatibility, database feature parity, and ISV certifications for security, compliance, and HA/DR.

Monitor ecosystem uptake in China’s hyperscale and FSI sectors; early production references will be a leading indicator of viability versus entrenched Nvidia-centric stacks.

Hema Kadia

All Posts

Explore Magazines

Promote your brand

AI Pulse: Telecom’s New Frontier

Private 5G/LTE and CBRS Networks in Action: Transforming Industries

TeckNexus Newsletters

I acknowledge and agree to receive TeckNexus communications in line with the T&C and privacy policy.

Check Private Network Readiness

Industry Vertical Specific Deep-Dive Assessment

* Prices does not include tax

Recents Updates| View All

Google Invests $40B in Texas AI Data Centers

Tech News & Insight

November 18, 2025

AT&T activates EchoStar 3.45 GHz to boost 5G and FWA

Tech News & Insight

November 18, 2025

Cisco Acquires NeuralFabric for Enterprise AI

Tech News & Insight

November 17, 2025

Jeff Bezos Returns as Co-CEO of Project Prometheus AI

Tech News & Insight

November 17, 2025

NTT Docomo AI-AI Air Interface Trial Doubles 6G Throughput with Nokia Bell Labs and SK Telecom

Tech News & Insight

November 17, 2025

Bharti Airtel S&P Upgrade on Strong Q2, ARPU and 5G

Tech News & Insight

November 17, 2025

Feature Your Brand in Upcoming Magazines

Showcase your expertise through a sponsored article or executive interview in TeckNexus magazines, reaching enterprise and industry decision-makers.

Huawei Kunpeng CPUs challenge Nvidia in AI chips

Huawei Kunpeng CPU roadmap expands AI compute stack

From Kunpeng 950 in 2026 to 256 cores by 2028

TaiShan 950 SuperPoD for mainframe-class database workloads

Open-source database benchmarks show linear scaling

Why Kunpeng matters for AI infrastructure and Nvidia’s moat

CPUs drive data ingestion, preprocessing, and orchestration

Huawei is building a full-stack AI alternative

Bridging the software gap is the toughest challenge

Competitive context: Arm servers, FSI, and China’s supply strategy

Arm server momentum versus x86 incumbency

Financial services modernization and data gravity

Sanctions, supply, and domestic ecosystems

What to watch and next steps

Guidance for telecom and cloud providers

Guidance for enterprises and financial architects

Metrics and milestones to watch

Hema Kadia

TeckNexus Newsletters

Whitepaper

Whitepaper

Whitepaper

Subscribe To Our Newsletter

Tech News & Insight

Tech News & Insight

Tech News & Insight

Tech News & Insight

Tech News & Insight

Tech News & Insight

Feature Your Brand in Upcoming Magazines