Memory And Design Advances From The AI Infra Summit

This article discusses memory and chip and system design talks at the 2025 AI Infra Summit in Santa Clara, CA by Kove, Pliops and Cadence.
John Overton from Kove gave a talk about their Linux based memory software that runs on any hardware and allows sharing memory between servers to increase memory utilization, CPU and GPU utilization. He pointed out that GPUs have been scaling as have CPUs, but conventional memory systems do not, leading to overprovisioning in servers and processing bottlenecks.
He said that Kove SDM software can be installed in a system in 15 minutes and allows unlimited memory accessed from virtualized elastic memory pools across servers and supports up to 64PiB of DRAM per process. He also said that the software can hide latency, making memory look local to a CPU, even when it isn’t. He said this can work across Infiniband and RoCE fabrics.
In particular he said Kove SDM can hide latency between memory that is over 150m away. The image below shows the resulting AI performance improvements.
Kove AI performance improvements
Kove
He said that for AI inference, with Kove it can run 3-5X faster and that this had been shown working with partners such as Redhat and SuperMicro to show key-value, KV, cache at scale. These are results from benchmarks using Redis and Valkey. The slide below summarizes their claims for ROI improvements using Kove. He also said it can improve vector search and enable larger concurrent processing.
Kove SDM return on investment
Kove
Pliops was showcasing its XDP LightingAI at the Summit, a GenAI native memory stack to power interfence and retrieval workloads for hyperscale and enterprise applications. The product consists of an ASIC, the Extreme Data Processor, XDP, a software stack and distributed nodes and uses a GPU-initiated Key-Value I/O interface.
According to Pliops, deploying Pliops XDP LightningAI in data centers offers significant cost savings. Instead of needing four times as many GPU servers to support 4X more users for LLM inferencing, you can maintain your existing GPU infrastructure and add XDP LightningAI servers. This results in a 67% optimization in rack space, a 66% reduction in power consumption, 58% annual OpEx savings, and a 69% decrease in initial investment costs.
Pliops is collaborating with Tensormesh, inference optimization software. The company says combining LightningAI memory acceleration with Tensormesh’s shared KV cache architecture results in fast time-to-first token and GPU savings across multi-GPU clusters.
Another interesting talk was that of Charles Alpert from Cadence. Cadence is an electronic design software company that enables device and system simulation and has been doing this for decades. He spoke about the various challenges in AI infrastructure as shown in the slide below. These include energy consumption—driving GW data centers, thermal management and the time to operationalize that infrastructure.
AI infrastructure challenges
Cadence
He talked about how this can be improved by adding AI for design to design for AI to create continuous improvements in data centers and the devices that work with them in a virtuous cycle. Cadence has tools for data center design as well as its traditional semiconductor design and in semiconductors their tools can also work with 3D stacks of die and include Multiphysics digital twin simulation. He said that over half the chips built today use AI technology and that this will accelerate to 90% in the next few years using agentic AI as shown below.
AI driven chip design trends
Cadence
Agentic AI should lead to levels of autonomous design, similar to those used for talking about autonomous driving as shown below. In his talk he went into detail on each of these levels for increasing autonomy.
Levels of autonomous chip design
Cadence
He also talked about how these new EDA tools with multi-physics capability will enable designing 3D devices made from stacking semiconductor die, often called heterogeneous integration as shown in the figure below. Doing this requires massive system level integration and is resource intensive.
AI driven 3D die integration design
Cadence
In particular he pointed out that these 3D workloads will benefit from improvements in the number and capability of GPUs.
Last year Cadence announced its Palladium Emulator, shown below. Jensen Huang from Nvidia called it an essential tool for designing NVIDIA’s most advanced chips, including the Blackwell series.
Cadence Palladium emulator
Cadence
Cadence has expanded this technology beyond chip design towards creating digital twins of data centers including all the functional components. Their Millennium M2000 system enables much faster chip design as well as system design. These are tools for Cadence’s Digital Twin Ecosystem for data center design as shown below.
Cadence Reality digital twin ecosystem
Cadence
Kove talked about massive memory sharing to accelerate AI inference, Pliops was showing their AI-native memory stack and Cadence talked about complete digital twin data center design at the 2025 AI Infra Summit.