Any Model, Any Accelerator, Any Cloud: Unlocking Enterprise AI with Open Source Innovation
Powering the Future of Enterprise AI

Any workload, any app, anywhere was the mantra at Red Hat Summit 2023. Over the past two years… Well, we have seen some changes in IT. But Red Hat’s vision has not changed; it has evolved.
Any Model, Any Accelerator, Any Cloud
That is the hybrid cloud message for the Artificial Intelligence (AI) era. The best part? Just like the “old” hybrid cloud, it is all fuelled by open source innovation. At Red Hat Summit this year, we have shown how AI ecosystems structured around open source and open models can create new options for enterprises.
Openness brings choice, and choice brings greater flexibility—from the model that best meets organisational needs, to the underlying accelerator, and out to where a workload will actually run. Successful AI strategies will follow the data, wherever it lives on the hybrid cloud.
And what fuels the hybrid cloud? Open source.
Inference makes AI better
To me, we need to start looking beyond models—yes, models are very important to AI strategies. But without inference—the “doing” phase of AI—models are just collections of data that do not “do” anything. Inference is how fast a model responds to user input and how efficiently decisions can be made on accelerated compute resources—slow responses and poor efficiency ultimately cost both money and customer trust.
This is why I am excited that Red Hat is putting inference front and centre of our work with open source AI, starting with the launch of Red Hat AI Inference Server. Built on the leading open source vLLM project and enhanced with technologies from Neural Magic, Red Hat AI Inference Server brings a supported, lifecycled and production-ready inference server to AI deployments. Best of all, it can truly follow your data, wherever it lives—any Linux platform, any Kubernetes distribution, Red Hat or otherwise, will work with the solution.
What’s Better Than Enterprise AI? Enterprise AI at Scale
The killer application for enterprise IT isn’t some single, unified workload or new cloud service: It’s the ability to scale—quickly and efficiently. This is true for AI, too. But AI comes with a unique twist in that the accelerated compute resources underlying AI workloads also need to scale. That’s no small task, given the expense and skills required to properly implement this hardware.
What we need is not just the ability to scale AI, but to also distribute massive AI workloads across multiple accelerated compute clusters. This is further compounded by the inference time scaling required for reasoning models and agentic AI. By sharing the burden, performance bottlenecks can be reduced, efficiency can be enhanced, and ultimately the user experience is improved. Red Hat has taken a step to answering this pain point with the open source llm-d project.
Led by Red Hat and backed by AI industry leaders across hardware acceleration, model development and cloud computing, llm-d pairs the proven power of Kubernetes orchestration with vLLM, putting two leading lights of open source together to answer a very real need. Along with technologies like AI-aware network routing, KV cache offloading, and more, llm-d decentralises and democratises AI inference, helping organizations to get more out of compute resources while having more cost-efficient and effective AI workloads.
Open (Source) to What’s Next in AI
Llm-d and vLLM—delivered by Red Hat AI Inference Server—are open source technologies primed to answer today’s challenges, right now, in enterprise AI. But upstream communities don’t just look at what needs to be done now. AI technologies have a unique way of condensing timelines—the rapid pace of innovation means that something you thought would not be a challenge for years from now suddenly must be met head on.
This is why Red Hat is committing resources to working upstream in Llama Stack, the Meta-led project to deliver standardised building blocks and APIs for gen AI application lifecycles. More than that, Llama Stack is very well suited to building agentic AI applications, which represent a further evolution of the powerful gen AI workloads we see today. Beyond the upstream, we’re making Llama Stack available as developer preview within Red Hat AI, for organisations that want to engage with the future today.
When it comes to AI agents, we are still lacking a common protocol for how other applications provide context and information to them. This is where model context protocol (MCP) comes in. Developed and open sourced by Anthropic late in 2024, it offers a standardised protocol for these agent-to-application interactions, not unlike client-server protocols in more traditional computing. But the big deal is that existing applications can suddenly become AI-capable without extensive redevelopment. That is huge, and it would not be possible without the power of open source. Like Llama Stack, MCP is available as a developer preview in the Red Hat AI platform.
Proprietary AI models may have taken an early lead, but open ecosystems have certainly taken over—especially in the software that supports these next-generation AI models. Through vLLM and llm-d, along with hardened enterprise open source products, the future of AI is bright, no matter the model, the accelerator or the cloud. And it is powered by open source and Red Hat.