IBM Targets Enterprise AI Advantage With Faster Inference As Rivals Chase Bigger Models

Branding at the headquarters of IBM Germany

As OpenAI, Google, and other tech giants chase ever-larger models, with each claiming a new benchmark score record every month, enterprises face a quieter but far more practical challenge: inference.

The process of running trained AI models to analyze new data and generate answers might sound simple in theory, but at scale, it’s where most companies stumble. GPUs, originally engineered for graphics rendering, excel at raw computation yet falter under the weight of millions of real-time queries. This leads to soaring costs, latency issues, and massive energy demands.

IBM, which has often positioned itself as the architect of enterprise computing, is stepping into that gap. Instead of chasing larger models, the company is positioning itself as the AI enabler, the connective layer that turns intelligence into execution. Its latest ecosystem bet focuses on the invisible but critical foundation of modern AI: inference infrastructure.

Through new partnerships with Anthropic and Groq, the California-based startup known for its Language Processing Units (LPUs), IBM aims to reimagine how enterprise AI operates in production environments.

“Data is everywhere, multiple clouds, edge, on-premises, and enterprise AI must be able to work across hybrid environments. We have a layered model strategy, balancing IBM-built innovation with strategic partnerships to accelerate outcomes,” Rob Thomas, SVP and chief commercial officer at IBM, told me. “We take advantage of various models, small language models like Granite, large language models from our partners like Mistral and Meta, and frontier models through our partnership with Anthropic, and use the best model for each use case.”

With Groq’s inference hardware now integrated into IBM’s watsonx Orchestrate, the company claims enterprises can run agentic AI systems up to five times faster and more cost-efficiently than traditional GPU-based setups.

“AI is still stuck in the ‘dial‑up’ era – models can give accurate answers but to give high-quality research-grade answers can mean waiting up to 10 minutes while an LLM or agent goes off and thinks,” said Jonathan Ross, CEO and founder of Groq. “Faster processing also drives up usage and thus compute costs, so speed must be coupled with cost efficiency.”

Traditional GPUs, Ross explained, excel at parallel, batch-oriented workloads such as model training. But when it comes to low-latency, multi-step reasoning, the kind of dynamic execution required for agentic AI, GPUs falter. LPUs use a software-controlled, assembly-line architecture, moving data in a deterministic flow to eliminate bottlenecks common in GPUs and deliver real-time AI performance.

“Agentic AI improves LLM outputs by decomposing a task into a series of explicit steps and executing each step sequentially. This “think‑in‑steps” approach yields better results, though it also multiplies the compute required, driving up both latency and cost,” he told me. “GPUs are the right tool for training, or the creation of AI models. LPUs are the right tool for inference, or the running of AI models.”

Inside IBM’s AI Enabler Tech Stack

As enterprises move beyond AI experimentation and into production, IBM is unifying three essential layers of the modern AI stack: intelligence, inference, and governance, through its watsonx platform.

At the intelligence layer, IBM’s collaboration with Anthropic brings Claude models into the stack, offering reasoning with transparency, a critical advantage for regulated industries that require explainable systems, not opaque black boxes. Beneath that, Groq’s deterministic LPUs replace power-hungry GPUs to deliver low-latency inference at scale, keeping pace with real-world enterprise workloads. Anchoring the system is IBM’s governance foundation, powered by Red Hat OpenShift, which ensures that every AI interaction remains auditable, compliant, and secure across hybrid and on-premises environments.

“IBM’s strategy seems to be working as of now,” Dario Maisto, senior cloud analyst at Forrester, told me. “Whether it will promote sustainable growth in the long run is a different question. On the upside, this way of working gives IBM the flexibility to leverage new AI solutions and partnerships more opportunistically.”

IBM says its strategy is already finding traction among clients with heavy inference demands. For instance, a major health insurance provider uses watsonx Orchestrate to process thousands of patient and provider queries simultaneously. By running AI agents on Groq’s LPUs, the system now delivers real-time answers with near-zero lag, helping improve patient satisfaction and accelerate claim approvals.

“Our role is to provide the full technology stack enterprises need to operationalize AI. Our Watsonx portfolio accelerates ROI in core workflows. Our Granite models remain critical for enterprise-specific tasks where control and customization matter. And frontier models from partners like Anthropic push the boundaries of what’s possible,” added Thomas. “Three years out, I believe IBM will be the enabling layer that makes AI work across the entire enterprise technology estate.”

IBM’s Modular Strategy vs. Hyperscaler Dominance

The next AI race may hinge on who can deploy it most intelligently, and IBM is charting a different course. While Microsoft and Google have built tightly integrated ecosystems around their foundation models and clouds, IBM’s strategy is modular and hybrid by design, avoiding ecosystem lock-in.

Microsoft has heavily relied on OpenAI’s frontier models, embedding GPT copilots across Microsoft 365, Azure, GitHub, and Windows. The company reportedly now holds roughly 27% of OpenAI, valued at around $135 billion in a recent recapitalization. That stake has helped fuel a powerful feedback loop that strengthens its models and secures enterprise loyalty, driving a $123 billion annualized cloud revenue run rate as of Q3 2025.

Google, pursuing vertical integration, combines proprietary Gemini and open-source Gemma models to power Vertex AI and platforms like Search and Android. Its TPUs and data control enable end-to-end ownership of the AI stack, contributing to roughly $61 billion in cloud revenue.

IBM, meanwhile, focuses on orchestration, not ownership. Its watsonx platform and Small Language Models (SLMs) support hybrid and on-premises environments, while Red Hat OpenShift enables deployment across clouds or secure, air-gapped systems. With roughly 2% cloud share, IBM is doubling down on next-generation inference and reliability.

“Every hyperscaler wants to own the on-ramp, enterprises want freedom, across clouds, sovereign regions, and edge sites,” said Haseeb Budhani, CEO and co-founder, Rafay. “The winner is the platform that makes that experience feel the same everywhere: whether it’s a public cloud, on-prem, or neocloud. That’s a high bar. If IBM clears it, great. If not, multi-cloud stays a slide, not a system.”

Budhani added that in AI infrastructure, economics, not hardware, will help win the race. “GPUs don’t sell themselves; experience does,” he said. When orchestration improves utilization and enables predictable spending, especially across sovereign and regional clouds, procurement decisions shift.

After years of being perceived as cautious in the AI arms race, IBM is now aiming to carve out a leadership position by mastering the infrastructure that makes AI practical.

“Experimentation is easy. Production is hard. Companies encounter three key challenges when attempting to achieve ROI from AI: speed, cost, and reliability. That’s where IBM’s full technology stack, including our new partnership with Groq, changes the game,” said Thomas. “We give clients a choice.”

Forbes