Oracle OCI: Episode 2: Bare Metal vs. Virtual Machines: Which is Best for HPC in the Cloud?
This podcast was originally published in Oracle Cloud Innovator Series
Oracle Cloud Infrastructure Blog
Welcome to Oracle Cloud Infrastructure Innovators, a series of occasional articles featuring advice, insights, and fresh ideas from IT industry experts and Oracle cloud thought leaders.
Companies that want to run high performance computing (HPC) workloads in the cloud can get a significant performance boost by choosing bare metal servers over virtual machines (VMs)—and nobody does bare metal like Oracle Cloud Infrastructure.
I recently sat down with Karan Batta, who manages HPC for Oracle Cloud Infrastructure, to discuss several HPC topics, including the key differences between running HPC workloads on bare metal and running them on VMs. We also talk about Oracle’s approach to bare metal cloud and how it differs significantly from the competition.
Listen to our conversation here and read a condensed version:
You often speak about the concept of bare metal cloud. Can you explain why HPC workloads are some of the best types of workloads to run in a bare metal cloud environment?
Karan Batta: Certainly. But first, let’s take a step back. A lot of cloud providers have tried bare metal, but they haven’t done it the way we have. With them, bare metal cloud always comes with an “if” or a “but” and there is always a catch. They say things like: “You want bare metal? Great. Tell us how many servers you need. We’ll go buy them and provision them manually and you can come back in three months.”
For us, bare metal is all about providing the same consistent performance compared to your on-premises cluster or on-premises data center—but with the added benefits and flexibility of the cloud. That’s really what we’ve enabled here. Our bare metal offering is a fully multi-tenant bare metal environment where any customer can come in and spin up an instance that looks just like any other instance. It just so happens that there is no Oracle software running on it, there is no hypervisor running on it, and you get better performance for what you pay. This is really what it means to be running on a bare metal cloud. The reason HPC workloads are well-suited for bare metal is because of the great performance boost that bare metal provides.
You mentioned that there is no hypervisor. But Oracle Cloud Infrastructure offers virtual machines (VMs) as well, correct?
Batta: Yes, definitely. We were initially called Bare Metal Cloud, but we’ve rebranded as Oracle Cloud Infrastructure because we offer VMs as well. So, if you want to do some test dev workloads on a VM and then move them to bare metal, you can absolutely do that.
Why would an organization avoid running HPC workloads in cloud-based VMs?
Batta: When you use a hypervisor, you’re essentially looking at anywhere from 10-15 percent performance tax. That’s a rough idea of how much performance you’re going to lose because you’re adding overhead on top of your server. If I’m already paying $3, $4, or $5 per hour for an instance and losing 10-15 percent of performance, that kind of defeats the purpose of running HPC in the cloud. We’ve tried to make sure that when we talk about HPC, we mean that we’re going to match your on-premises performance and we’re going to give you an amazing price for it.
You mentioned that bare metal cloud offers a 10-15 percent performance boost over virtualized cloud environments. What does that mean for our customers?
Batta: What it means is that customers can reduce the time that workloads take from days to hours to minutes. Some people might say a 10-15 percent performance boost is not a big deal. But for anyone who runs resource-intensive HPC workloads, that is not the case. For them, 10% could translate to hours. If you’re running, for example, a machine learning or an artificial intelligence job, or if you’re running a distributed deep learning training job for image recognition or voice translation —those types of jobs can take 16-20 hours. In some of the bigger cases, like a search engine optimization, those things take weeks to run. So, a 10 percent performance boost there could mean that you’re reducing the job by hours if not days. So, I think there is a huge difference between bare metal and VMs.
Suppose an enterprise wants to run a combinational HPC workload, with some on bare metal cloud and some in a virtualized environment simultaneously. Is it possible to run that and scale up and down?
Batta: Yes, you could do that today on Oracle Cloud Infrastructure. And the great thing is you can scale this up, down, left, right—you know you name it, we can do it. With Oracle Cloud Infrastructure you get performance and flexibility side by side. If you’re running an HPC job, and you just want to quickly test it, you can spin up a couple of VMs and one core or even a fraction of a core. Then you can move to a fully bare metal instance with something like 52 physical cores—the largest bare metal instance you can find on any cloud—and you can run your production workloads. The other thing bare metal provides is flexibility. Not only do you have the ability to run HPC on our VMs, but you can move your entire virtualized environment and we will para-virtualize it on top of our bare metal nodes.
Come talk with Karan and the rest of the team about HPC on Oracle Cloud Infrastructure at SC18 in Dallas next week in booth #2806.