Gaming chip changes mil/aero game

GTC – the Graphics Processing Unit (GPU) Technology Conference – used to be the preserve of the video gaming mavens, but as General Purpose GPU (GPGPU) computing has taken off in the wider world, attendees now hail from a far wider background. Lately these non-gamer aficionados have become even more excited. What’s going on?

The answer is NVIDIA’s new Tegra K1 System-on-Chip (SoC). This “superchip” combines a quad-core Central Processing Unit (CPU) with 192 parallel processing GPU cores. Intended for mobile devices such as gaming units, cell phones, and tablet computers, the Tegra scored an amazing 60 frames per second in the GFXBench at 1920x1080 resolution. But the new SoC’s most compelling feature is that it can achieve this performance while consuming less than 10 watts of power.

What does that mean to non-gamers? Designers of High Performance Embedded Computing (HPEC) systems in the military/aerospace domain, for example, scent major benefits. Tegra K1’s combination of high performance and small power appetite could open a wealth of new applications and bring vastly improved reconnaissance capabilities to small formations and individual soldiers. That’s because its Size, Weight, and Power (SWaP) equation will tell most powerfully in the smallest platforms, which previously have been extremely limited in processing power.

Historically, the bandwidth of the downlink from an unmanned vehicle to its ground station has limited the vehicle’s sensor capability. Why put a high-definition video sensor on a vehicle if the onboard processor could not crunch the sensor data fast enough to get information down the pipe to the user within tactical timelines? The performance limitation meant, for example, that instead of reducing sensor data to an area of interest, the whole image might have to be streamed down and be processed on the ground, adding latency to the process. The number of sensors and the resolution of the sensors were limited. If there were multiple sensors – such as a visible light and an infrared sensor – they might have to be used consecutively rather than simultaneously.

By the same token, however, if processor resources could be shrunk and their power requirement reduced, more could be achieved onboard within the downlink’s bandwidth constraints. More sensors – higher-resolution sensors – could be added. This advanced processing technology could also be retrofitted to the underserved population of small platforms.

Game changer?

The Tegra K1 chip promises to enable these changes (see Figure 1). Its sub-10-watt power budget is an order of magnitude less than what is often required today for a GPU subsystem. An image processing system today, in a 3U VPX format, might require two cards: a single board computer card and a GPGPU card, burning a combined 100 watts. Tegra K1, however, could potentially consolidate that capability into a single 3U VPX board at a fraction of the power budget. The smaller batteries required at this power level would intensify the SWaP tradeoff for smaller platforms.

Figure 1: NVIDIA’s Tegra K1 promises to bring the high performance graphics processing capability of subsystems like GE’s MAGIC1 to a new generation of smaller, lighter, more portable platforms.
(Click graphic to zoom by 1.9x)

Algorithm agnostic

Unlike NVIDIA’s predecessor Tegra3 mobile GPU chip, the Tegra K1’s GPU cores are capable of being programmed via the CUDA model, which means that algorithms developed for applications running on more complex, power-hungry processing subsystems can be adapted to the simpler hardware. These include image processing algorithms such as stabilization, fusion, tracking, and ground moving target indicator.

GPUs have been attractive to mil/aero developers, especially since the invention of GPUDirect Remote Direct Memory Access (RDMA), a method of transferring sensor data directly from a data aggregator like an FPGA to the GPU memory. Under GPUDirect the CPU still initiates the request and collects the results but has much less work to do in transferring the data to GPU memory. A three-step data transfer process is condensed to two steps, freeing the CPU to manage more GPGPUs and accelerating the data processing cycle.

While these developments aren’t earth shaking for large platforms with power to spare for image processing, the implications of the new chip for small platforms are likely to have a significant and lasting effect, benefiting the most vulnerable and underserved users in the field.