Serial RapidIO with Intel-based DSP embedded systems saves slots and boosts performance
Until now, it was not possible for x86 DSP system designers to harness native Serial RapidIO support. However, the 2nd-Gen Intel Core i7 processor, along with PCIe2-to-SerialRapidIO2 bridging technology, is bringing high performance to military-use DSP engines.
Serial RapidIO is well established as the switched serial fabric of choice for high-performance Digital Signal Processing (DSP) in embedded military applications. Its use is well defined and supported by the OpenVPX/VITA 65 standard. But until recently, Serial RapidIO has only been practical for use in Power Architecture-based systems. Thanks to the recent introduction of Intel’s 2nd-Generation Core i7-2715QE quad-core processor, and a PCI Express-to-Serial RapidIO bridge, x86-based embedded military DSPs are now able to take full advantage of the Serial RapidIO interconnect for DSP processing. The new Core i7 features the numerous improvements of the Intel Sandy Bridge architecture aimed at performance and efficiency. And it offers 256-bit wide Advanced Vector Extensions (AVX) floating-point instructions that effectively double the peak floating-point performance compared with the previous 128-bit SSE instructions that form the core of fundamental DSP calculations such as Fast Fourier Transforms (FFTs).
Prior to this latest generation of Core i7, there had been no practical support for Serial RapidIO using Intel processors. This greatly limited the viability of the Intel architecture’s use in military and aerospace DSP multiprocessor systems. Attempts at doing PCIe-to-Serial RapidIO protocol conversion in FPGAs were expensive and lacked the Serial RapidIO messaging support that is so useful for control loops in signal processing applications. Alternately, fabric solutions for Intel architecture-based distributed systems included InfiniBand, which is used in the enterprise computing world but is not preferred by military system integrators. Another choice is Gigabit Ethernet. For single board computers, where the requirement is typically a single processor communicating with I/O, these interconnect choices have been sufficient. However, due to lack of native Serial RapidIO support, system designers either could not use Intel-based processors, or if they wanted to use Intel-based architectures, had performance limitations because they were unable to use Serial RapidIO as their data plane interconnect.
The good news is that the Intel Core i7 Gen2 combined with a new generation of PCI Express (PCIe) Gen2-to-Serial RapidIO Gen2 bridge provides a solution for direct Serial RapidIO support on Intel-based platforms. The following discussion provides a comparison of Serial RapidIO to 10 GbE in DSP systems as well as a look at this enabling bridge technology.
10 GbE versus Serial RapidIO
In comparing Serial RapidIO to 10 GbE, the short answer is that Serial RapidIO provides significantly higher performance and saves valuable board slots compared to 10 GbE implementations (Figure 1). Furthermore, Serial RapidIO supports distributed switch architectures, and Serial RapidIO switches are small, low-power devices (starting at 21 mm x 21 mm, ~3 W). Their size and functionality make it common for board designers to provide a Serial RapidIO switch onboard their DSP engine cards to locally aggregate multiple computing nodes. While it’s theoretically possible to build DSP boards that have an Ethernet switch onboard, Ethernet switches are significantly larger (typically 30 mm x 30 mm to 40 mm x 40 mm and no small lane count options) than Serial RapidIO Gen2 switches, which are available in 16- and 32-lane options. On a practical level, these bigger, power hungry devices, built for use in the enterprise/IT environment, are too cumbersome to deploy onboard a 3U or 6U VPX multiprocessor DSP engine. Where Ethernet switches are used in DSP applications today, they require a separate card, taking up valuable slot space and adding weight (a typical rugged card weighs 1.0-1.2 Kg) in SWaP-constrained military platforms. For systems that require a high level of fault tolerance, designers must add a second redundant Ethernet switch, consuming an additional slot and adding even more weight.
Additionally, Ethernet switches offer end-to-end packet termination latency that can be in the order of milliseconds and also require processor intervention to terminate the protocol stack. This results in major performance and overall system power penalties that make it a nonstarter for real-time military and aerospace systems that are rack-space, power-consumption, and cooling-capacity constrained.
Moreover, 10 GbE is not ideal for supporting all of the system topologies that Serial RapidIO can easily support. The majority of embedded DSP systems deployed today are fewer than eight slots. One of the common topologies used on these distributed processing systems is a full-mesh architecture in which each card is connected to every other card. This approach is attractive because it delivers very high card-to-card bandwidth and does not exhibit a single point of failure. OpenVPX defines four ports on the data plane. A system designer can use these four ports to build five-card distributed systems in which each card has a connection to the other four. While the five-card full-mesh is the ultimate in card-to-card bandwidth, larger systems can also be constructed using distributed switching where packets pass through the switches of intermediate cards. The high bandwidth of Serial RapidIO makes this practical for systems up to 16 slots in size.
A typical Intel-based DSP system using 10 GbE would require at least six slots, with one for a dedicated Ethernet switch card. This is compared to the five slots required in a system using Serial RapidIO, as each DSP card can have multiple bridges per processor, mapped into a small Serial RapidIO switch and then have 4x4 Serial RapidIO links to the backplane. This reduction in system slots delivers benefits far beyond familiar Size, Weight, and Power (SWaP) hurdles. Minimizing the board count will improve Mean Time Between Failures (MTBF).
Another aspect of DSP system design where Serial RapidIO-based boards provide an advantage over 10 GbE is in hybrid-processor/FPGA designs. Virtually every new system design today includes a mix of FPGAs and conventional microprocessors. Implementing Serial RapidIO in an FPGA is more practical than adding 10 GbE into an FPGA. That’s because terminating 10 GbE requires an additional processor (and software) and can’t be done autonomously in FPGA code.
PCIe2-to-Serial RapidIO2 protocol conversion bridge
The goal of military DSP systems designed for use in signal processing applications is optimal bandwidth and reliability in rugged environments. To deliver the near real-time processing of analog sensor data needed to identify signals of interest requires the best achievable combination of data throughput and low latency. In today’s embedded system design environment, that combination is best delivered with the joint solution consisting of the Core i7 and rugged DSP engines based on a PCI Express (PCIe) Gen2-to-Serial RapidIO Gen2 bridge, versus a 10 GbE path.
These bridges can provide mapping from PCIe Gen2 into a Serial RapidIO Gen2-based switch onboard and into the backplane. The bridges can be as small as 13 mm x 13 mm, and support memory mapped transfers and Serial RapidIO messaging. PCIe Gen2-to-SerialRapidIO2 bridges can deliver 16 Gbps, compared to 10 Gbps supported by 10 GbE. The performance of 10 GbE drops even further when packet sizes are small, which is the preferred approach in embedded systems for better real-time performance. For 256-byte packets, 10 GbE delivers only 8 Gbps throughput.
IDT’s Tsi721 is an example of such a PCIe Gen2-to-SerialRapidIO2 bridge, and makes any processor look like a PowerPC to the Serial RapidIO network. A key feature is that each of the bridge’s 8 DMA and 8 messaging transmit and receive queues is able to support the full 16 Gbps line rate for 64 byte and larger packets. These features make it possible to transfer large amounts of data in a DSP system with low latency at 16 Gbps. Thus, a given channel can be mapped to a physical core in a processor, or even a virtual context, maximizing performance at a system level and simplifying system software development.
Figure 2 depicts a rugged, high-performance DSP engine that harnesses the combination of Serial RapidIO and Intel’s latest Core i7: the new OpenVPX CHAMP-AV8 from Curtiss-Wright Controls Embedded Computing. The dual Core i7 card, DSP engine utilizes IDT’s Tsi721 bridge. CHAMP-AV8’s processors deliver up to 269 GFLOPS. With IDT’s bridge chip, the card delivers triple the bandwidth of first-generation VPX products (up to 160 Gbps fabric performance). This once again proves that Serial RapidIO is a high-performance path to DSP system implementation.
Editor’s note: Curtiss-Wright Controls has two separate and distinct divisions working on embedded technologies. This article was written by CWCEC (Curtiss-Wright Controls Embedded Computing).
Curtiss-Wright Controls Embedded Computing 703-779-7800 www.cwcembedded.com
IDT 613-592-0714 www.idt.com