FPGA coprocessors for acceleration of shape recognition algorithms in hybrid VPX HPEC systems

2To reach the level of performance requested by the latest military specifications, electronic warfare (EW) systems designers rely more and more on VPX high-performance embedded computing (HPEC) platforms. To handle the global IP traffic growth - predicted to reach 132 exabytes (EB) per month in 2018, according to Cisco's Visual Network Index - electronic systems must manage the data flow in and out of the semiconductor devices. Designers of field-programmable gate arrays (FPGAs) have developed devices offering high bandwidth and performance with very high-speed interfaces that can bring superior parallel processing power. This reality enables the design of high-performance hybrid HPEC systems that can be used for such demanding applications as ultrafast shape-recognition systems.

Leading companies have developed some quite high-performing devices. A Xilinx VU13P UltraScale+ FPGA offers more than 3.5 million logic cells, 11,904 enhanced slices for , and as many as 128 32.75 Gb/s GTY transceivers, allowing massive data flow and routing while supporting multiterabit per second throughput. These new GTY transceivers feature autoadaptive equalization and major power reduction. Integrated 100 GbE MAC architectural blocks and PCIe Gen3 cores are also included for faster communications. Designers tend to use very high-width buses, typically up to 2,048 bits, allowing massive parallel applications. The challenge in these parallel architectures is the clocking: Advanced clock management, clock network centered on user logic, and distributed clock buffers enable designers to maximize performance and reduce dynamic power.

These new offer an extremely high parallel-processing power relative to power consumption. In parallel computing these FPGA can bring more than ten times the Gigaflops per watt performance of the last processors. To benefit from these enhanced numbers, systems are now based on tight coupling between FPGAs and high-end processors. High-speed backplanes can allow for huge data flow between clusters of FPGA boards and boards. The .1 and 66.4 standards have been developed in particular to reach the high transceiver line rate of these FPGAs.

Innovative FFTs and DCTs in low-power implementations

is participating in a research and development program, in cooperation with France’s University of Brittany (UBO), aimed at using these technologies in the field of shape recognition. The first part of this program consisted of developing innovative implementations of fast Fourier transforms (FFT) and discrete cosine transforms (DCTs) in FPGAs, greatly reducing execution time while decreasing the necessary resources. In this joint endeavor, the teams have considered the FPGA resources in term of number of LUTs [Look Up Tables] used, both in terms of level of output of the FFT per second and full execution time. Compared to the IP now on the market, the team improved by a factor of four the throughput versus the number of LUTs, with a nearly twofold improvement of the logic resources multiplied by execution time. As for the DCT, the joint project improved by a factor of three the logic resources multiplied by execution time. These improvements were achieved while keeping close control of power consumption. The execution, in continuous mode, of one 1,024-point FFT consumes about 100 mW of power in a Virtex-6 FPGA. For 4,096 points, the power consumed rises to 150 mW; for a 64-point FFT, power consumption drops to only 50 mW.

In the second part of this research program, these performance improvements and power conservations have been used to increase the performance of shape-recognition algorithms implemented on HPEC platforms.

The base of the shape-recognition algorithm is the processing of a digital correlation between a target image and a reference image. This correlation is carried out by multiplying the spectrums of the target image and the reference image (using a 2-D FFT to get the spectrums of the images), and then taking the inverse 2-D FFT of the result. Finally, the energy of the correlation peak normalized to the total energy of the correlation plan is processed. If the correlation peak is above a given threshold, a decision is made that the target image is identical to the reference image. If the threshold is not reached, another target image is loaded and the process is repeated. To improve the performance of the process, some transformations are made on the spectrum of the reference image (called the filter) to get an adapted filter used for the spectrum multiplication and before the inverse FFT transformation. Overall, this process leads to a high-performing algorithm that has fine discrimination when applied to face recognition. This same approach may be applied to other kinds of shape recognition.

This correlation architecture has been implemented into Xilinx FPGAs. One XC7VX690T can simultaneously support around 30 correlation architectures described above, with each correlation architecture being able to process around 4,000 images per second. At this rate, that means that one XC7VX690T can process and decide on 120,000 images per second.

Interface Concept has designed signal processing FPGA boards specially designed for heavy signal processing and for communicating with the processor board. One example is the IC-FEP-VPX6b board, which features two Virtex-7 XC7VX690T and two FMC slots (Figure 1). It also features a QorIQ processor acting as a PCI Express Root Complex to control the board. The PCIe protocol is used for the VPX data plane communication between the boards. Four PCIe fat pipes are available on the P1 connector as a data plane communication. Each FMC slot can receive one IC-OPT-FMCa connected to 12 optical fiber connections at a rate of 10 Gb/s each; this rate can be sustained by the FPGA transceivers in front. Eight lanes are connected through the FMC VITA 57.1 connector to eight GTH transceivers, thus enabling the device to feed the Virtex-7 FPGA with the flow of images that will be filtered by the convolution algorithm. All these products can be plugged into a conduction-cooled VPX architecture, which allows operation in constrained environments in the absence of a cooling air flow.

21
Figure 1: IC-FEP-VPX6b featuring two Virtex-7 FPGAs.

By using a VPX hybrid HPEC platform integrating five IC-FEP-VPX6b boards, each one having two FMC optical mezzanines as the IC-OPT-FMCa and integrating one IC-INT-VPX6b, users can build a platform based on the above correlation architecture that is able to process as fast as 1.2 megaimages per second.

By taking advantage of the latest FPGA technologies, optical low-power transceivers, and processors, it is possible to build very high performance hybrid HPEC rugged architectures that can tremendously accelerate the execution of algorithms that can be processed in parallel, such as shape-recognition algorithms like those used in EW systems.

Thierry Wastiaux is senior vice president of sales at Interface Concept, a European manufacturer of electronic embedded systems for defense, aerospace, telecom, and industrial markets. He has 25 years of experience in the telecom and embedded systems market, having held positions in operations, business development, and executive management. Prior to joining Interface Concept, he was responsible for the operations of the Mobile Communication Group and the Transmission Business Unit in Alcatel-Lucent. He holds an M.Sc. from France’s Ecole Polytechnique. Readers may contact him at twastiaux@interfaceconcept.com.

Interface Concept www.interfaceconcept.com