Oracle OCI: Episode 1: What is HPC in the Cloud? Exploring the Need for Speed
This podcast was originally published in Oracle Cloud Innovator Series
Oracle Cloud Infrastructure Blog
Welcome to Oracle Cloud Infrastructure Innovators, a series of occasional articles featuring advice, insights, and fresh ideas from IT industry experts and Oracle cloud thought leaders.
Companies that want to run high performance computing (HPC) workloads in the cloud can get a significant performance boost by choosing bare metal servers over virtual machines (VMs)—and nobody does bare metal like Oracle Cloud Infrastructure.
I recently sat down with Karan Batta, who manages HPC for Oracle Cloud Infrastructure, to discuss several HPC topics, including the key differences between running HPC workloads on bare metal and running them on VMs. We also talk about Oracle’s approach to bare metal cloud and how it differs significantly from the competition.
Listen to our conversation here and read a condensed version:
Let’s start with a basic definition. What is HPC and why is everyone talking about it?
Karan Batta: HPC stands for High Performance Computing—and people tend to bucket a lot of stuff into the HPC category. For example, artificial intelligence (AI) and machine learning (ML) is a bucket of HPC. And if you’re doing anything beyond building a website—anything that is dynamic—it’s generally going to be high performance. From a traditional perspective, HPC is very research-oriented, or scientifically-oriented. It’s also focused on product development. For example, think about engineers at a big automotive company making a new car. The likelihood is that the engineers will bucket all of that development—all of the crash testing analysis, all of that modeling of that car—into what’s now called HPC. The reason the term HPC exists is because it’s very specialized. You may need special networking gear, special compute gear, and high-performance storage, whereas less dynamic business and IT applications may not require that stuff.
Why should people care about HPC in the cloud?
Batta: People and businesses should care because it really is all about product development. It’s about the value that manufacturers and other businesses provide to their customers. Many businesses now care about it because they’ve moved some of their IT into the cloud. And now they’re actually moving stuff into the cloud that is more mission-critical for them—things like product development. For example, building a truck, building a car, building the next generation of DNA sequencing for cancer research, and things like that.
Legacy HPC workloads include things like risk analysis modeling and Monte Carlo simulation, and now there are newer kinds of HPC workloads like AI and deep learning. When it comes to doing actual computing, are they all the same or are these older and newer workloads significantly different?
Batta: At the end of the day, they all use computers and servers and network and storage. The concepts from legacy workloads have been transitioned into some of these modern cloud-native type workloads like AI and ML. Now, what this really means is that some of these performance-sensitive workloads like AI and deep learning were born in the cloud when cloud was already taking off. It just so happened that they could use legacy HPC primitives and performance to help accelerate those workloads. And then people started saying, “Okay, then why can’t I move my legacy HPC workloads into the cloud, too?” So, at the end of these workloads all use the same stuff. But I think that how they were born and how they made their way to the cloud is different.
What percentage of new HPC workloads coming into the cloud are legacy, and what percentage are newer workloads like AI and deep learning? Which type is easier to move to the cloud?
Batta: Most of the newer workloads like AI, ML, containers, and serverless were born in the cloud so there already ecosystems available to support them in the cloud. Rather than look at it percentage-wise, I would suggest thinking about it in terms of opportunity. Most HPC workloads that are in the cloud are in the research and product development phase. Cutting-edge startups are already doing that. But the big opportunity is going to be in legacy HPC workloads moving into the cloud. I’m talking about really big workloads—think about Pfizer, GE and all these big monolithic companies that are running production workloads of HPC on their on-premises clusters. These things have been running 30 or 40 years and they haven’t changed.
Is it possible to run the newer HPC workloads in my old HPC environment if I already have it set up? Can companies that have invested heavily in on-premises HPC just stay on the same trajectory?
Batta: A lot of the latest HPC workloads are the more cutting-edge workloads were born in the cloud. You can absolutely run those on old HPC hardware. But they’re generally cloud-first, meaning that they have been integrated into graphics processing units (GPUs). Nvidia, for example, is doing a great job of making sure any new workloads that pop up are already hardware accelerated. In terms of general-purpose legacy workloads, a lot of that stuff is not GPU accelerated. If you think about crash testing, for example, that’s still not completely prevalent on GPUs. Even though you could run it on GPUs if you wanted, there’s still a long-term timeline for those applications to move on. So, yes, you can run new stuff on the old HPC hardware. But the likelihood is that those newer workloads have already been accelerated by other means, and so it becomes a bit of a wash.
In other words, these newer workloads are built cloud-native, so trying to run them on premises on legacy hardware is a bit like trying to put a square peg in a round hole. Is that correct?
Batta: Exactly. And you know, somebody may do that, because they’ve already invested in a big data center on premises and it makes sense. But I think over time this is going to be the case less and less.
Come talk with Karan and others about HPC on Oracle Cloud Infrastructure at SC18 in Dallas next week in booth #2806.