Building in rad tolerance - and not as an afterthought

3Since no device fabrication technology is 100 percent immune to radiation effects in space, it’s vital to mitigate radiation risks in space by designing in rad tolerance right from the start. “After the fact” just doesn’t work. (Photo courtesy of NASA/JPL-Caltech/UCLA)

Our mission as engineers and engineering services companies is to develop and deploy product that provides technical solutions to our customer. It is our responsibility to assess and report the risks involved with a particular technology so that mission planners understand how a product will operate in the field.

While we take great care in the design and analysis of each system to ensure the lowest possible risk, some risk is unavoidable. No matter the magnitude, it is our duty to understand, quantify, and communicate this risk. This is particularly important when it comes to mitigating radiation tolerance in products and systems designed for space. Such risk mitigation must be designed in early – at the start, even.

How to mitigate risk

Risks come in all forms: design, environmental, mission, budgetary, and so on. Each can be handled through a four-step risk-mitigation process:

1. Determine the risk

2. Analyze the risk

3. Implement the mitigation plan

4. Validate the mitigation plan

Let’s look at environmental risk. If a product needs to operate at a particular temperature, we would 1) determine which aspects or portions of the system might be marginal in that region, and then 2) perform an analysis to determine the realization likelihood (expectation) of the risk by performing system- or card-level thermal analyses; 3) Mitigate the risk by designing an appropriate thermal solution such as a heat sink, fan, or heat pipe; 4) Validate the thermal solution through testing in a thermal chamber. Rinse and repeat for all other design risks.

Radiation’s impact on the system

Radiation should be treated as an environment, like thermal, shock, and vibration. Independent of the source or species, radiation exists – always and everywhere – and those boards and assemblies used in such environments need to be characterized for operation. Based on the mission, the environment should be carefully characterized and the solutions tested to obtain an accurate cross-section to determine susceptibility to the environment.

And from our past experience, we have divided the solution sets for radiation capabilities into two primary segments:

1) Process and technology related

2) Design related

Process and technology related effects – or all radiation-related pathologies defined by a particular technology selection – include Lattice Direct Displacement (LDD) or Total Ionizing Dose (TID). LDD pathologies can be attributed to a neutron environment (envision a wrecking ball hitting the integrated circuit) and typically impact bipolar devices more than CMOS ones. TID, a measure of the amount of energy deposited into a material per unit mass (measured in Rad or kRad), will determine the total delivered dose to a particular chip before the onset of permanent lattice displacement damage.

Design related effects – or all radiation related pathologies that need to be mitigated by design – are not necessarily technology or process related. For example, take a flip flop (circuit with two stable states) controlling a pyrotechnic device. Independent of how the flip flop is manufactured, a catastrophic outcome can result if the device changes state “unintentionally” because of a radiation event. One remedy might be designed-in redundancy through flip-flop triplication and “majority voted” output.

Currently, no device fabrication process or technology is 100 percent immune to these types of radiation effects: The only way to mitigate them is by design.

The nature of radiation effects

The majority of design-related radiation effects fall under the umbrella of Single Event Effects (SEEs) such as:

  • Single Event Upset (SEU) – Simply defined as a bit flip. This affects all bi-stable elements like flip flops, memories, microprocessors (collections of flip flops and memories), and so on. The key to an SEU is its non-permanent status, for example, the bit can be “unflipped” using Error Detection And Correction (EDAC) or scrubbing.
  • Single Event Transient (SET) – Similar physically to an electrostatic discharge, but the charge accumulation is because of radiation.
  • Single Event Latchup (SEL) – This occurs when a particle, heavy ion, or proton causes a parasitic structure like a thyristor or a Silicon Controlled Rectifier (SCR) within the chip to conduct from the supply to the ground. This might present an upset in operation or permanently damage the component. Note: At Aitech, we classify SEL as a “design” related effect, although it could be categorized as a process technology related effect.
  • Single Event Gate Rupture (SEGR), Single Event Snapback (SESB), and Single Event Burnout (SEB) – Primarily manifested on CMOS, power Bipolar Junction Transistors (BJTs) and MOSFETs – these events can cause the power device to either conduct current unexpectedly, as a transient event, or permanently in a “latched” state.

Whether the event is because of a gamma ray, a proton, a neutron, an electron, or some heavy ion, to claim radiation hardness, military and aerospace as well as space system designers need to characterize the susceptibility of the design to all potential hazards of the radiation environment (Figure 1).

21
Figure 1: Many fields of the military and aerospace industries require reliable, fail-proof components and subsystems that ensure proper system operation.
(Click graphic to zoom by 1.9x)

Of course, one can tailor the particular environment to the applicable set of tests to ensure proper operation in a given environment, hence, radiation classified as an environment. But there is no silver bullet that fits all environments optimally. Knowledge of these effects and their interactions is critical to determine how to handle them in a specific application.

Grades of radiation-tolerance process

Historically, a “rad-hard” design referred solely to total dose effects – typically 100 kRad or better – and required only proper components engineering to implement a design that could survive a given dose. Some military COTS systems might meet this level of hardness, but that’s only the starting point for a total radiation hardened solution.

As feature sizes shrink, and circuits increase in performance and complexity, other radiation-related effects have started to dominate and require mitigation for a fully rad-hard solution. As a result, “rad-hard” has altered its traditional meaning, and the industry is moving towards a more stringent “radiation tolerant” or “radiation characterized” standard. Thus, the term “rad hard” is becoming increasingly misused by those who loosely extend the traditional meaning to encompass all radiation effects.

Outlined below are five grades of radiation tolerance, developed to meet the evolving radiation performance requirements and supporting the four-step risk mitigation approach. Figure 2 shows the relative complexity (and associated cost) of the different grades of radiation tolerance.

22
Figure 2: Levels of rad-tolerant design requirements and their project costs
(Click graphic to zoom by 1.9x)

Grades of radiation tolerance

1. No radiation tolerance: Also the baseline for the engineering effort required for a given project, this provides the customer with a design that assumes no particular radiation environment. Parts and systems, qualified to military and space standards, are in fact high-reliability systems. The system might display desired radiation performance, but this is not guaranteed.

2. Total dose mitigation only: Originally known as “rad hard.” The definition has evolved over time to include levels 3-5 as well. Traditionally, this step involves selecting components based on proven TID performance, and, once design is complete, optionally verifying the design by performing TID testing.

3. SEE rad-tolerant by design: As the most basic level of single event mitigation, it is purely an engineering solution to mitigate radiation in the system by designing redundancy into the system. Although Triple Modular Redundancy (TMR) is considered the “platinum standard” for achieving this goal, it does not scale very well. It is critical at this stage to balance the redundancy with system complexity to secure radiation performance, without reducing system reliability because of the increased complexity.

4. Proton SEE radiation characterized: This level of radiation tolerance is a characterization step. This level ensures the system operates in a given proton environment, and determines the system cross-section or susceptibility for a radiation event. This involves testing at radiation test facilities.

5. Heavy ion SEE radiation characterized: This is the most challenging level of characterization, because it involves the most specialized hardware and hardware configurations. Typically, this level of radiation tolerance is required for Geosynchronous Orbits (GEOs), high-inclination Low Earth Orbits (LEOs), and interplanetary missions. But it has also cropped up in some terrestrial applications.

Completion of level 4 is the point at which a product can be called “radiation characterized” or “tolerant,” and Level 5 offers full radiation tolerance.

As per the risk mitigation plan mentioned earlier, until the system has been validated, there is no proof of its performance.

Hence, the grades start at Level 1 and proceed through each level to reach Level 5. The burden of each level to proceed to the next level is the engineering and testing required in implementing and verifying the performance of the product required by each level.

Key things to note about radiation tolerance, and its expanding attributes, are:

  • Radiation data is unique. Data from one manufacturer’s device does not necessarily apply to the same part type from another manufacturer.
  • Radiation data is dynamic, not static. As military and aerospace as well as space systems evolve to support mission demands and as processes and feature sizes on integrated circuits shrink, radiation performance of the systems built from these chips also evolves and changes.
  • Military COTS products are typically at most Level 2 tolerant, with the bulk of product in the Level 1+ regime. 1+ means that typically no more than 10 percent – comprising 2 percent for Level 1 and 8 percent for parts selection for TID – has been performed and thus accounted for.

Mitigate risk ahead of time

Radiation tolerant products must be designed to mitigate dose and SEE at the very outset, as an environment. It is not something that can be “added-on” later; radiation tolerant designs must be qualified by design and verified by test. While lot/sample testing and radiation mitigation based on technology are necessary, they’re only the tip of the iceberg.

Philippe Kassouf is CTO, Space Business Sector, at Aitech. He leads a team of engineers in designing space-qualified and radiation-tolerant SBC, Gigabit Ethernet network switch, and Network Interface Card (NIC) solutions. Philippe has been involved in designing buses for micro-satellites, fault-tolerant avionics for satellites and launch vehicle avionics, and the International Space Station (ISS). He can be contacted at pkassouf@rugged.com.

Aitech 888-248-3248 www.rugged.com