Ensuring readiness of parts for space
High-performance computing in satellites requires the use of advanced, commercial off-the shelf (COTS) technologies in an environment that has previously been the domain of parts produced in rad-hard foundries. The challenges of this environment require careful implementation of both hardware and software techniques to produce space assets with advanced capabilities and the operational reliability required by the warfighter.
Space is a difficult place to do business. While charged particles trapped in the Van Allen belts can produce dazzling aurora effects for earth-bound observers, degradation due to the natural radiation environment can be a critical factor in determining the operational life of equipment used in commercial and military missions. Unfortunately, in the harsh radiation environments of space, the need for robust technologies often conflicts with the need for performance. New technologies can provide opportunities for great success and for spectacular failure. This last point is especially true if new features within the devices are not well understood, or if the automatic features of design tools trap the unwary.
In general, radiation effects can be placed into two broad categories. Wear-out caused by total ionizing dose (TID) takes place gradually over the course of a mission as a part repeatedly encounters particles that ionize the semiconductor material. In more complex integrated circuits such as memories and microprocessors, changes at the transistor level turn into functional failures when increased propagation delay, reduced drive strength, or an inability of a digital device to switch state makes a part inoperable. Single-event effects (SEE) occur when a single, highly ionizing particle interacts with a semiconductor. The resultant charge reorganization can last somewhere between a few nanoseconds and a few microseconds with effects that vary from small, benign transients, to catastrophic failure of the device.
At geosynchronous orbits, SEE are usually initiated by galactic cosmic rays that have traveled for light years to cause problems in your parts. While within the earth’s magnetic field, SEE are caused primarily by protons that undergo a nuclear reaction with the silicon making up the semiconductor to create the daughter products that leave the damaging ionization track.
Frequently, the problem of radiation effects has been mitigated through the use of radiation-hardened (RH) parts. The rapid development of the commercial semiconductor industry, however, makes it difficult for RH parts to keep pace with the higher memory densities and increased processing performance in commercial off-the-shelf (COTS) parts. In many cases no RH equivalents are available. For these COTS parts to be used, some level of radiation mitigation has to be considered that takes into account how the parts, the environment, and the protection will interact.
High-performance computing = high-capability missions
A variety of techniques and a mixture of COTS and RH parts can enable reliable computing performance (Figure 1). At the heart of the single-board computer (SBC) are three Power PC processors arranged in a triple mode redundant architecture (TMR). Detection hardware compares each output from all three processors every clock cycle. Periodic resynchronization (scrubbing) purges latent errors by clearing the contents of all three processors. In the event of an error, the upset processor is disabled until it can be scrubbed. However, the system continues to operate normally, because the voted output continues to be valid. By varying the scrub rate, the system upset-rate can be tailored to the mission requirements.
Both the volatile and nonvolatile memory are commercial devices protected by error correction codes. The SDRAM uses Reed-Solomon configured as 64 data bits and 32 check bits; this configuration detects and corrects any two-device failures. EEPROMs have historically shown immunity to SEU, and as a result the EEPROMs can operate with less robust ECC and are implemented with single-bit error correction. On the other hand, the EEPROMs have a lower TID tolerance and are packaged using DDC’s RadPak technology to shield the die and reduce the dose inside the package.
A handful of rad-hard parts are used in select functions where the cost of developing solutions from commercial parts is too difficult or expensive to be practical. These functions include the 1553 interface and the field-programmable gate array (FPGA). Because of their flexibility, FPGAs provide a fast route to developing rad-hard computing solutions. In the case of the SCS750, the flip-flops in the FPGA are TMR protected and used to support the voting circuitry for the processors and error correction for the memories (SDRAM and FLASH).
The flash memory revolution
Flash memory is everywhere: mobile phones, automobiles, industrial robotics, and the list goes on. The reasons to pull flash memory into space missions are fairly obvious. Along with high-volume production comes the advantage of a sizeable industrial base producing highly reliable parts. With extremely high densities (several up to hundreds of Gb) flash NAND are appealing as mission enablers.
However, the additional capabilities do not come without a few additional quirks. Unlike older technologies like SRAM, NAND devices are subject to bad bits naturally arising during normal operation conditions. Even in terrestrial environments, ECC is required to achieve good endurance and retention characteristics. On Earth, single corrupted bits occur due to charge leakage in a memory cell; in space, however, the upset mechanisms are more diverse. Multiple-bit upsets and functional interrupts need to be taken into account. As always, though, the solution can be tailored to the mission. In gentler orbits, a robust ECC such as a BCH code can take care of the error rate, while in harsher environments the ECC code for errors may need to be combined with a TMR architecture (or redundant copies) to handle functional interrupts.
Like many COTS parts, flash memories are susceptible to TID degradation at doses well below those required for many missions. These effects are typically worse for multilevel cell (MLC) devices where multiple bits are stored in a single data cell and the tolerance to a threshold shift is much smaller. As a result, space designers tend to gravitate toward single-level cell (SLC) devices, which operate with significantly more margin. TID tolerance can vary significantly depending on the feature size. In many cases, total dose effects may be mitigated by adding additional shielding to the device package.
The new frontier
While military electronics no longer lead the market in innovation, different techniques have made it possible to use commercial parts in systems that require high reliability. One unexpected consequence of this new reality is a change in qualification processes in recent years. In the past, all parts going to space were expected to be bulletproof; now, in contrast, designers have come to grips with the challenges of using commercial parts. They’ve compensated for this shift by either using heavily redundant designs or mounting low-cost, short-duration missions. A prominent example of the heavily redundant approach comes from SpaceX, whose Dragon supply ship can lose an engine yet still successfully complete its mission. Moreover, the low-cost/short-duration category is typified by the advent of the CubeSat. In both cases, the new approach has opened the door for low-cost, high performance missions, which promise exciting new capabilities and discoveries in the years to come.
DDC Inc. www.ddc-web.com