Any workload, any app, anywhere was the mantra at Red Hat Summit 2023. Over the past two years…well we’ve seen some changes in IT. But Red Hat’s vision hasn’t changed; it’s evolved.
Any model. Any accelerator. Any cloud.
That’s the hybrid cloud message for the AI era. The best part? Just like the “old” hybrid cloud, it’s all fueled by open source innovation. At Red Hat Summit this week, we’re showing how AI ecosystems structured around open source and open models can create new options for enterprises. Openness brings choice, and choice brings greater flexibility - from the model that best meets organizational needs, to the underlying accelerator, and out to where a workload will actually run. Successful AI strategies will follow the data, wherever it lives on the hybrid cloud.
And what fuels the hybrid cloud? Open source.
Inference makes AI better
To me, we need to start looking beyond models - yes, models are very important to AI strategies. But without inference - the “doing” phase of AI - models are just collections of data that don't “do” anything. Inference is how fast a model responds to user input and how efficiently decisions can be made on accelerated compute resources - slow responses and poor efficiency ultimately cost both money and customer trust.
This is why I’m excited that Red Hat is putting inference front and center of our work with open source AI, starting with the launch of Red Hat AI Inference Server. Built on the leading open source vLLM project and enhanced with technologies from Neural Magic, Red Hat AI Inference Server brings a supported, lifecycled and production-ready inference server to AI deployments. Best of all, it can truly follow your data, wherever it lives - any Linux platform, any Kubernetes distribution, Red Hat or otherwise, will work with the solution.
What’s better than enterprise AI? Enterprise AI at scale.
The killer application for enterprise IT isn’t some single, unified workload or new cloud service: It’s the ability to scale - quickly and efficiently. This is true for AI, too. But AI comes with a unique twist in that the accelerated compute resources underlying AI workloads also need to scale. That’s no small task, given the expense and skills required to properly implement this hardware.
What we need is not just the ability to scale AI, but to also distribute massive AI workloads across multiple accelerated compute clusters. This is further compounded by the inference time scaling required for reasoning models and agentic AI. By sharing the burden, performance bottlenecks can be reduced, efficiency can be enhanced, and ultimately the user experience is improved. Red Hat has taken a step to answering this pain point with the open source llm-d project.
Led by Red Hat and backed by AI industry leaders across hardware acceleration, model development and cloud computing, llm-d pairs the proven power of Kubernetes orchestration with vLLM, putting two leading lights of open source together to answer a very real need. Along with technologies like AI-aware network routing, KV cache offloading and more, llm-d decentralizes and democratizes AI inference, helping organizations to get more out of compute resources while having more cost-efficient and effective AI workloads.
Open (source) to what’s next in AI
Llm-d and vLLM – delivered by Red Hat AI Inference Server – are open source technologies primed to answer today’s challenges, right now, in enterprise AI. But upstream communities don’t just look at what needs to be done now. AI technologies have a unique way of condensing timelines - the rapid pace of innovation means that something you thought wouldn’t be a challenge for years from now suddenly must be met head on.
This is why Red Hat is committing resources to working upstream in Llama Stack, the Meta-led project to deliver standardized building blocks and APIs for gen AI application lifecycles. More than that, Llama Stack is very well suited to building agentic AI applications, which represent a further evolution of the powerful gen AI workloads we see today. Beyond the upstream, we’re making Llama Stack available as developer preview within Red Hat AI, for organizations that want to engage with the future today.
When it comes to AI agents, we’re still lacking a common protocol for how other applications provide context and information to them. This is where model context protocol (MCP) comes in. Developed and open sourced by Anthropic late in 2024, it offers a standardized protocol for these agent-to-application interactions, not unlike client-server protocols in more traditional computing. But the big deal is that existing applications can suddenly become AI-capable without extensive redevelopment. That’s huge, and it wouldn’t be possible without the power of open source. Like Llama Stack, MCP is available as a developer preview in the Red Hat AI platform.
Proprietary AI models may have taken an early lead, but open ecosystems have certainly taken over - especially in the software that supports these next-generation AI models. Through vLLM and llm-d, along with hardened enterprise open source products, the future of AI is bright, no matter the model, the accelerator or the cloud. And it’s powered by open source and Red Hat.
About the author
Chris Wright is senior vice president and chief technology officer (CTO) at Red Hat. Wright leads the Office of the CTO, which is responsible for incubating emerging technologies and developing forward-looking perspectives on innovations such as artificial intelligence, cloud computing, distributed storage, software defined networking and network functions virtualization, containers, automation and continuous delivery, and distributed ledger.
During his more than 20 years as a software engineer, Wright has worked in the telecommunications industry on high availability and distributed systems, and in the Linux industry on security, virtualization, and networking. He has been a Linux developer for more than 15 years, most of that time spent working deep in the Linux kernel. He is passionate about open source software serving as the foundation for next generation IT systems.
More like this
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech