Multicore processors in the mission-critical context

Story

September 12, 2019

Guillem Bernat

Rapita Systems

Multicore processors are increasingly being adopted in the critical systems domain, especially within the mission-critical military context. They offer a solution to the issue of long-term availability of single-core processors and the increasing processing power needed to facilitate increased innovation in military systems. As multicore processors offer neither a deterministic environment nor predictable software execution times, a new verification approach – one that solves the challenges of multicore timing analysis – is needed for their safe use.

Continued progress on SWaP (size, weight, and power) concerns of processors has resulted in multicore-powered cellphones containing more power than the Apollo 11 lunar lander. The benefits offered by using multicore processors have led to the widespread adoption of this technology across mainstream technology industries, with single-core processors now representing only a tiny share of the market. Due to this shift, chip manufacturers are moving away from producing these legacy processors, and their long-term availability is in serious doubt.

As the supply of single-core processors continues to diminish and modern embedded systems continue to mount in popularity, the adoption of multicore processors is inevitable. The safe use of these processors in mission-critical military domains is challenging, however, as they offer neither a deterministic environment nor predictable software execution times.

The gold standard of military avionics certification

DO-178C is the primary document by which prominent certification authorities such as the FAA [Federal Aviation Administration] and EASA [the European Union Aviation Safety Agency] approve all commercial software aerospace systems. Over the years, it has also become the de facto gold standard for the use of software in military avionics systems.

The FAA has supplemented DO-178C guidance with Position Paper CAST-32A – titled “Multi-core Processors,” to address the increasing use of multicore processors in aviation.

The U.S. Army’s designated primary Airworthiness Authority, the AMRDEC Aviation Engineering Directorate (AED), published a [draft] guidance document called “Multi-Core Processor (MCP) Airworthiness Requirements,” in which DO-178C and CAST-32A objectives are identified as guidelines that can be used in satisfying the MCP [multicore processor] airworthiness requirements.

Timing analysis is one of the core objectives identified in CAST-32A guidance and is addressed specifically by the objective dubbed MCP_Software_1, which requires evidence demonstrating that all hosted software components function correctly and have sufficient time to complete their execution when operating in their multicore environment. This is a very challenging objective to satisfy and has proven to be a serious obstacle for military and aerospace companies aiming to certify multicore projects.

Analyzing multicore timing behavior

Verification solutions designed to verify the timing behavior of single-core systems are not applicable to multicore timing analysis for a number of reasons, mostly because these solutions fail to account for the effects of interference caused by resource contention. To verify the timing behavior of multicore systems, new methods are needed that specifically address the challenges of multicore timing analysis.

Accounting for resource contention and interference

The timing behavior of a task in a multicore system task is affected not only by the software running on it and its inputs, but also by contention over resources such as buses, caches, and GPUs that are shared with tasks running on other cores. To design experiments to analyze the timing behavior of a multicore system, sources of interference must be identified and accounted for.

Figure 1 shows a simplified example of a multicore architecture where a bus is shared between multiple cores. Traffic caused by accesses to this bus by Core N are likely to have an impact on the timing behavior of an application running on Core 0 that needs to access this bus.

Figure 1 | Example of interference channel in a multicore system.

Assumptions must be tested

To analyze the timing behavior of a multicore system, there are necessarily some assumptions about the behavior of the system under study, including the effects of interference channels present. Due to the complexity of multicore systems, seemingly logical assumptions made about the system may later be proved incorrect, potentially requiring an iterative process of making assumptions, testing them, and using analysis results to refine assumptions for the next round of testing.

This is best explained with a practical example: Under study is the sensitivity of a memory-intensive application running on a Xilinx Zynq Ultrascale+ ZCU102 target board to different levels of interference. The Application Processing Unit on which the application was running has four cores. The reasonable assumption has been made that the L2 cache is a major interference channel for this application due to prior knowledge of the system. To validate this assumption, a test was performed where the application was running while sustained accesses were made on the L2 cache from tasks running on between 0 and 3 contender cores. (Figure 2.)

Figure 2 | CPU cycles and L2 cache misses.

If the assumptions were valid, then the number of both L2 cache misses and CPU cycles taken for the application to execute would increase with each additional contender core. The figure shows that this assumption held until the introduction of a third contender core. This increased the number of CPU cycles, but the number of L2 cache misses remained around the same as when only two contender cores are active.

The complexity of interference effects in multicore systems means that designers should expect to need iterative cycles of forming assumptions, testing them, and using the analysis results to form new assumptions. While there is no way to automate this process, an engineer can develop expertise both in terms of how to form reasonable assumptions about multicore processors and how to reassess those assumptions during investigatory work by working on multiple projects and building their experience. Effective reassessment and testing will lead to a well-rounded understanding of how the multicore processor behaves and what things will affect its timing behavior.

Testing on the real hardware

Multicore CPUs are complex and often their internals are hidden, making purely analytical models of limited use in understanding their timing behavior. While purely analytical (static-analysis) models can provide usable timing estimates for single-core systems, this is not the case for multicore systems. Even were these methods to be used, they would produce highly pessimistic results based on the pathological worst-case behavior of the multicore configuration, and these results would be of no practical use.

To produce usable timing metrics from multicore systems, timing behavior on the system itself must be measured. Engineers at Rapita Systems use a collection of microbenchmarks, developed by the Barcelona Supercomputing Center, to stress specific shared resources and observe the timing behavior of an application when this contention is in place. By applying a configurable degree of contention on specific shared resources using this technology, experiments can be formulated that help analyze timing metrics based on feasible timing environments. These experiments can produce key evidence needed to satisfy CAST-32A timing objectives, such as worst-case execution times (WCET). (Figure 3.)

Figure 3 | By applying a configurable degree of contention on specific shared resources, experiments can be formulated that help analyze timing metrics based on feasible timing environments.

Multicore timing analysis cannot be entirely automated

The complexity of multicore processors means that building a fully automated timing analysis solution is unrealistic. While tool support can automate most of the data-gathering and analysis processes, engineering wisdom and expertise is needed to understand the system and direct tool usage to produce necessary evidence. The more experience engineers have in understanding multicore systems, investigating interference channels, and using supporting tools, the more efficient the analysis process will be.

Mission-critical going forward

Mission-critical embedded systems used in the military sector are increasingly utilizing multicore processors. It is imperative that certification considerations for these systems are not an afterthought, but are considered early in the development process. Thankfully, DO-178C provides a set of robust objectives for ensuring the safe and reliable use of these processors. Timing analysis for multicore systems is challenging, but tried and tested solutions are available to perform it in a commercial context.S

Dr. Guillem Bernat is an expert on execution time analysis for real-time systems with over 70 published papers in international conferences and journals. Since 2004, he has been CEO of Rapita Systems, which provides verification solutions to the global embedded embedded aerospace and automotive industries.

Rapita Systems
www.rapitasystems.com