Nvidia’s AI Factory Vision Comes Into Focus With Rubin CPX

Posted by Jim McGregor, Contributor | 4 hours ago | /ai, /innovation, AI, Innovation, standard | Views: 22


At the InfraAI Global Summit’25, Nvidia announced a new member to its upcoming Vera Rubin data center AI product family. The Rubin CPX will complement the standard Rubin AI Graphics Processing Unit (GPU) in providing high-value inference content generation at a more cost-efficient price. More importantly, it fits into the data center infrastructure Nvidia has designed for a multi-AI GPU data center.

Tirias Research has consulted for Nvidia and other AI companies mentioned in this article.

Tirias Research has long forecasted the need for a variety of AI inference accelerators from companies like AMD, Intel, Nvidia and anyone else developing AI semiconductor solutions. Like any other data center workload, no two AI models are the same. As consumers and enterprises adopt AI and AI models continue to evolve, there will be an opportunity to optimize the hardware around an AI model or groups of models. However, GPUs will remain one of the best solutions for both AI training and AI inference processing for two key reasons, which Nvidia is building upon with the Rubin CPX announcement.

The Value Of The AI GPU

The first reason is the nature of the semiconductor industry. The tech industry swings like a pendulum. When new technology is introduced, there is a period of rapid innovation, or in the case of AI, daily innovation. When the pace of innovation slows, standards emerge. At this point, it makes sense to consider optimizing a functional task into a dedicated chip known as an application-specific integrated circuit (ASIC). In many cases, that function may eventually be integrated into a host processor like a Central Processing Unit (CPU) or GPU. However, developing a custom chip or functional block can take three or more years. With new models and ways to process these models changing rapidly, the GPU is a more practical solution than an ASIC for most IA applications.

The second reason is the ability of GPUs to be partitioned to handle multiple AI models concurrently. There is a myth that a transition from AI training to AI inference is coming in the near future. With the deployment of models like OpenAI’s ChatGPT models, Google’s Gemini, Microsoft’s Copilot, DeepSeek’s R and V series models, Anthropic’s Claude, Perplexity AI and countless others, the vast majority of AI processing across the industry is already inference processing. If such a line existed, it would have been crossed several years ago. With the programmable efficiency of AI GPUs and the buildout of GPU-enabled data centers, the vast majority of AI workloads, especially generative AI and agentic AI, are running on GPUs because they are the most efficient option.

Nvidia’s AI GPU Buildout

At GTC 2025, Nvidia introduced several key technologies for building AI-centric data centers. These included the NVL144 rack design, KV Cache, Dynamo, data center blueprints and enhancements to the company’s NVLink, Spectrum-X, and Quantum-X networking technologies. KV cache allows for the storage of computed key and value tensors to be used in subsequent AI generation and between GPUs. Dynamo is an open-source inference framework for planning and routing AI workloads in the data center, essentially an data center workload orchestrator. The NVL144 rack design and Nvidia networking technologies form the infrastructure of the data center. And the data center blueprints running on Omniverse provide a digital twin for the design, construction, and operation of an AI data center, or AI factory as Nvidia refers to them. Now, Nvidia has introduced the Rubin CPX, an AI GPU inference accelerator optimized to do specific functions exceptionally well. With Rubin CPX, Nvidia takes another step in designing an AI factory that can be optimized for specific AI functions.

Nvidia refers to Rubin CPX as a context inference accelerator designed for very complex AI tasks, such as millions of lines of software development, hours of video generation, and deep research. The Rubin CPX works in conjunction with the Vera CPU and Rubin AI GPU. The Vera CPU and Rubin AI GPU ingest the large volumes of data, which require high compute performance. Then, the Rubin CPX receives a contextual input to begin generating the output or content. This generational phase is more reliant on memory and networking bandwidth. As a result, the Rubin CPX, while built on the same Rubin AI GPU architecture, is designed differently than the Rubin AI GPU, with 128GB of GDDR7 memory plus hardware encode and decode engines to support video generation. The Rubin CPX is capable of 30 petaFLOPs of performance using the NVFP4 data format, a 3x increase in attention acceleration compared to the GB300 NVL72, and of processing a one-million-token context window. The memory and architecture changes result in a reduction of approximately 20 petaFLOPS of overall performance but an increase in contextual token generation efficiency.



Forbes

Leave a Reply

Your email address will not be published. Required fields are marked *