Comment on this article

Many-core processor devices head for advanced sensors

Advanced continues to demand greater performance and increased frame rates, while consuming less of a vehicle’s Size, Weight, and Power (SWaP) budget. The , and, more recently, the General Purpose Graphics Processor Unit (GPGPU) have been adopted for the highly parallel, repetitive front-end processing of raw data. Both these device types use large arrays of simplified execution cores to continually process incoming sensor data streams. This many-core approach to enhanced processor performance is growing rapidly as new devices, such as Tilera’s TILEPro64, become available. Driven by similarly data-intensive applications such as video transcoding, encryption, and deep packet inspection, the many-core tile offers another step up in performance for new sensor systems.

Transistor count

Silicon manufacturing processes are currently achieving upwards of 2 billion (2 x 109) transistors on a chip. However, as clock speeds are not increasing at the same rate, chip designers use the extra transistors to offer more functionality to increase their products’ performance. Because of their legacy requirements, mainstream processors such as Intel’s Core i7 tend to be improved incrementally by increasing the processing core count – by adding new graphics capability and introducing enhanced 256-bit vector processing capability for each core. Power dissipation becomes an issue as transistor count and clock speeds increase. Power must be managed and traded for ultimate performance, specifically in embedded applications where space and cooling capability are limited.

have also benefited from the same process improvements, offering many hundreds of arithmetic cores and more on-chip memory and interconnects. However, an FPGA-based solution can be difficult to implement, requiring intimate device knowledge, but is ideal where ultimate processing density is needed. An FPGA requires careful thermal and timing analysis, plus neither partial or online reprogramming have yet matured into realistic options for deployed capability growth. Despite the real advantages that FPGAs offer, many applications are turning toward software programmable alternatives for their rapid development and flexibility. The GPGPU and many-core tile processor demonstrate how fresh approaches can use the abundance of available transistors to achieve large arrays of cores for the parallel, multithreaded, repetitive operations needed at the front end of a sensor’s processing chain.

Data flow and processing strategies

Both GPGPU and many-core tiles are designed to process multiple data streams at very high rates. Generally, a GPGPU operates best on large, structured data sets running multiple threads of similar algorithms. By comparison, a tile device such as the TILEPro64 has 64 general-purpose processing cores, each with its own L1 and L2 caches, organized in an 8 x 8 matrix with nonblocking, high-speed switched data paths. This makes it well suited to content-based decision making and more varied use of the available cores. Supported by , cores can run individual copies of the operating system, or many cores can be configured to share in a Symmetrical Multiprocessing (SMP) environment – or combinations of each are possible on the same device.

The tile’s ability to use individual cores or to share data and run multiple threads in many cores makes it very efficient at raw data processing tasks: reorganization, beam forming, and adaptive cancellation. Today, such tile processors do not offer hardware floating-point capability. Software technologies can be adopted for less-demanding applications, although greater performance will require a heterogeneous architecture using a further processing stage based on GPGPUs or vector processors such as AltiVec or Intel’s AVX, while still achieving overall savings in board count.

Similar to the FPGA and GPGPU, the many-core tile processor has much greater military application potential than just radar. Sensor fusion and 360° local situational awareness in small agile platforms such as moving armored vehicles and helicopters are additional examples, requiring many channels of video data to be processed at high frame rates. Designed for many of these rugged, complex sensor processing applications, the MCP500 is a TILEPro64-based SBC from GE Intelligent Platforms, offered in 3U and VPX-REDI form factors and providing 2 GB of DDR2 SDRAM, dual 10 Gbps XAUI Ethernet ports, and dual four-lane PCI Express interfaces (Figure 1).

21
Figure 1: MCP500 many-core from GE Intelligent Platforms
(Click graphic to zoom by 1.6x)

Continuous process technology improvement still supports Moore’s Law rates of growth in transistors, though legacy architecture and packaging technology can constrain mainstream processors from taking full advantage. However, the video, gaming, Internet, communications, financial, and entertainment markets all have the potential to spawn innovative many-core architectures providing additional scope in the future for sensor design improvement and diversity.

To learn more, e-mail Duncan at duncan_young1@sky.com.