GPUs shift the computing paradigm: A 10 to 100x performance increase coming soon to a military system near you: Interview with Kevin Berce, Business Development Manager at NVIDIA

Why is real-time video rendering about to get a whole lot better? NVIDIA's CUDA programming environment, explains Business Development Manager Kevin Berce.

1GPU manufacturer NVIDIA has not only been keeping up with the times by offering processors for smartphones and tablets; it has also been enabling key shifts in the defense paradigm: Remember the days when it took four to six hours just to render one hour of UAV video? Now NVIDIA GPUs enable UAV video rendering in real-time. And NVIDIA is set to deliver an ultra-accelerated, GPU-enabled 10 to 100x performance increase for the defense industry, as a recent interview with Kevin Berce, NVIDIA Business Development Manager, reveals. Edited excerpts follow.

MIL EMBEDDED: Let’s start with a high-level overview of NVIDIA.

BERCE: NVIDIA was founded in 1993, so we’re approaching our 20th year. Our annual sales today are about $3.5 billion, and we are approaching 7,000 employees across 20 countries. The NVIDIA headquarters are in Santa Clara, Calif., and we have a global presence, especially in the Asia Pacific region, where many of our board customers are located and our processors are manufactured. We focus on engineering. We hold 1,900 patents, and continue to increase that number on a steady basis.

MIL EMBEDDED: OK, so which types of technology is NVIDIA focusing on these days?

BERCE: Our core heritage comes from the consumer desktop graphic space and spans gaming, high-end video editing, and other areas. Traditionally, our GPUs power everything from consumer desktops and laptops to game consoles. In recent years, NVIDIA has expanded both ends of that scale. Now we offer a range of advanced processors for mobile devices, including tablets and smartphones. Several of the new smartphones feature our Tegra processors, and then we also have the Tesla products where we are using the GPU to push into high-performance computing segments like government and defense. The primary need in these markets is to process and analyze large amounts of digital information extremely fast.

MIL EMBEDDED: Which brands fall under which market segments then?

BERCE: For mobile computing, we have Tegra; in the visual computing or the PC graphics space, then we have GeForce at the consumer end. On the professional end, which includes CAD/CAM, digital engineering, content creation for the movie industry, and much more higher-end applications, we have the Quadro brand. Finally, the Tesla product line is for the high-performance computing space.

MIL EMBEDDED: So remind us about GPU computing.

BERCE: GPU [Graphics Processing Unit] computing is using a CPU and a GPU together to accelerate data-intensive applications. What we have found is that the GPU has evolved from its core heritage in the graphics world to become a massively parallel processor, capable of running thousands of calculations simultaneously. This was a requirement in the graphics world, and became a technology that is very adept at dealing with large computational problems that need many processor cores to divide up the large computational workloads.

So, for example, if you have to give a CPU the problem of delivering a pizza to 20 different houses, a CPU essentially would load all the pizzas in one truck and go from one house to the next, dropping off the pizza until it completed the task. It would do this essentially in a serial fashion. If you were to give the same problem to a GPU, the GPU would take all the pizzas, divide two or three up across 20 different motorcycles, and send them all out at the same time: It would divide the task into parallel.

MIL EMBEDDED: Let’s drill down on the Tesla group for a moment. What’s involved there?

BERCE: We have the C-Class product, which is designed for workstations, and you can run multiple Tesla C-Class GPUs inside a single workstation, assuming the system has enough PCI Express slots. You can build yourself essentially a small mini supercomputer alongside your desk. The Tesla M-Class products are designed slightly differently as passively cooled and fit for server racks.

MIL EMBEDDED: Do you have any numbers on how fast Tesla runs in supercomputers?

BERCE: We’re talking computing performance on the Petaflop level. [According to the most recent announcement of the Top 500 List (www.top500.org)], Tesla GPUs are powering three of the top five supercomputing systems in the world. Two of these systems are in China and one is in Japan.

The systems are dealing with big computational challenges. It’s also worth noting that the Tsubame system is the world’s Greenest Production Supercomputer, meaning that it has Petaflop-class performance; but it’s also extremely power efficient, only consuming slightly over 1 megawatt. This characteristic is something uniquely enabled by GPUs.

MIL EMBEDDED: What is the biggest power consumption? Tens of megawatts?

BERCE: GPU-accelerated supercomputers require about half the power compared to a CPU-only supercomputer. NVIDIA strives to reduce power requirements with GPUs, thus I do not know what the largest power consumption is with other CPU-only systems.

MIL EMBEDDED: What value are GPUs adding to the defense industry these days?

BERCE: There are basically six verticals inside the defense space that we’re focused on, where the GPU is adding value of anywhere from a 10 to 100 times performance increase: satellite imaging, video enhancement, aerodynamics/CFD, computer vision, , and electromagnetics. Our defense GPU customers include system integrators, defense contractors, and many other partners.

MIL EMBEDDED: So what types of technical problems can NVIDIA GPUs solve, say, in or video systems?

BERCE: One of our customers, Motion , for example, has software called Ikena ISR. The problem they’re addressing is that when video sensors get deployed in platforms like UAVs, the video acquired is of a challenging quality to obtain intelligence from.

MIL EMBEDDED: Tell us a little about the hardware that’s being used with the Ikena ISR software.

BERCE: It’s actually just a Dell workstation and inside the Dell system [running Windows], there is a CPU and one Tesla GPU, the C2070. The C2070 has 448 GPU cores, 6 GB of memory, and uses a PCI Express card that is passively or actively cooled. The value the GPU brings to the workflow is the ability to process multiple tasks in . Without having a GPU in the mix, you wouldn’t be able to do this in real time.

MIL EMBEDDED: You’d have to render video offline and then get it all cleaned up.

BERCE: Exactly, in the Motion DSP example, if the system didn’t include a GPU, an hour of video would take somewhere between four and six hours to render. But with the GPU, it’s rendered immediately.

MIL EMBEDDED: What about ISR video analysis powered by a GPU?

BERCE: IntuVision makes a software product called Panoptes, which is designed to enable analytics on streams of video coming in. Often in military reconnaissance or for high-end retail establishments, you have video surveillance at many different spots you want to monitor and might only have one person monitoring up to 25 video feeds. It’s kind of challenging because at any given point, you can have human error and just miss something. But, having that process automated and offloaded to a computer, you know it’s going to sift through all that data. With a CPU, you can roughly process 4 HD streams at 3-5 frames per second, but we’re told that to get really valuable intelligence out of it, processing up to 20 frames per second is needed. By adding a GPU into this mix, you get a 12x speedup for better object tracking and real-time notification for up to 90 HD streams.

MIL EMBEDDED: Do most vendors buying your products – whether they’re software box companies or embedded guys like GE – use a single GPU instead of multiple GPUs, typically, then?

BERCE: The answer depends on the application they’re using. They can use up to eight GPUs in a single operating system. We’re seeing more and more use cases where folks are using more than one GPU. A challenge DARPA has is the desire to see how much a simulation would speed up when you add additional GPUs to the mix. That’s because one of the grand challenge problems they have in computational fluid dynamics is that a single 30-second simulation takes 150,000 CPU hours. So if they can make that simulation GPU aware, they can reduce the 150,000 CPU hours substantially.

MIL EMBEDDED: Switching gears, rocks are being thrown at NVIDIA because of the closed-naturedness of CUDA versus the freely licensed, open source OpenCL environment.

BERCE: The CUDA programming environment was essentially what enabled NVIDIA to turn the GPU from being a regular graphics processor into a massively parallel processor that can handle the type of work that we’ve been talking about today. It was a far cry from the world of traditional GPGPUs. If you remember about six or seven years ago, people talked about using GPUs for computing, but they were having to translate their code from a computational language into a graphics language so that the GPU could understand it. It was very difficult. CUDA enabled developers to write in industry-standard languages to access the GPU, so that the very first version was released with CUDA C; and it was basically very similar to C. You just added a few keywords and additional instructions to change your algorithm to understand targeting a many-core processor versus fewer cores on a CPU. So there’s CUDA C and now there is also CUDA C++. Third parties such as the Portland Group offer CUDA FORTAN as well. Most importantly, last week, the Portland Group introduced CUDA x86, which enables developers to compile CUDA codes to run on CPUs instead of GPUs.

So I wouldn’t say that it’s fair to call CUDA “closed” at this point – it supports multiple languages and open standards and can be modified to run on other architectures. CUDA will continue to be our platform for innovation; we have the ability to innovate incredibly fast. And we are able to add new features very quickly, which is something that customers in this space are very keen on us continuing.

Kevin Berce is Manager of Business Development at NVIDIA, responsible for the Defense and Intelligence business in the U.S. market. He has 15 years of experience in high-performance computing sales and business development in the U.S. Previously, Kevin was a member of SGI’s Federal leadership team. He can be contacted at kberce@nvidia.com.

NVIDIA 408-486-2000 www.nvidia.com

Topics covered in this article