Static analysis is going deeper with deep-flow data flow - Interview with Jill Britton, Consultancy Services Group Manager at PRQA
In the past 12 months, I’ve interviewed most of the major static analysis companies and can spot the similarities between their approaches. But in this interview with Jill Britton from PRQA, she tosses out a new-to-me term: “deep-flow data analysis.” Read on to see how it differs from (or is similar to) other static analysis tools you may be familiar with. Edited excerpts follow. – Chris A. Ciufo, Editor
MIL EMBEDDED: Let’s start with an overview of what PRQA does, please.
BRITTON: We [primarily] work with static analysis of C and C++ and are particularly interested in coding standards compliance. We’ve been working in this area for more than 20 years. We’re a committee member of MISRA and a committee member for ISO C. We have offices in the U.K., U.S., Netherlands, and India. We also have distribution partners in the Far East and Germany. Our main products are QA-C, QA-C++. We have compliance modules for MISRA, JSF, and high-integrity C++, which is our own product and freely distributed. We have partnerships with products such as VectorCAST for dynamic analysis and S101. We also provide services such as customer coding standards reviews, code audits, and general integration of our products into customers’ own build systems.
MIL EMBEDDED: You said compliance with JSF – what is JSF?
BRITTON: JSF is the Joint Strike Fighter and it has a C++ aeronautical coding standard. As MISRA is to cars, JSF is to the [defense] aeronautical industry.
MIL EMBEDDED: Tell me what is new in the software industry and where your products fit in.
BRITTON: The biggest thing in all the embedded industries is the rapid increase in lines of code. Ten years ago, you had thousands of lines of code; it’s now into the hundreds of thousands of lines of code. Critical safety issues are paramount. It’s very important that all software is tested and is as “safe” as possible, which is why you would have the coding standard. Our tool allows you to check the basic problems of code before it goes into test, so that major issues are avoided. The static analysis takes the code from the engineer completely and runs through it to check for obvious errors and undefined behavior. It replaces some types of peer review and adds an extra level of confidence to the code.
MIL EMBEDDED: What are the key methodologies for code analysis?
BRITTON: Well one of the new things that is coming to [our tool] is a deep-flow data flow analysis, which takes us into a deeper level of understanding of the code and highlights problems that arise in their own right or are associated with other bugs in design and coding, for example, uninitialized data (attempting to use a variable without setting a value to it) and array-bound violations (allowing data beyond the range of the array). These can cause some difficult-to-detect bugs. This data-flow analysis is a very powerful technique and is going to be in our new release of QA-C.
MIL EMBEDDED: Is “deep-flow data flow analysis” an industry term and you’re just now implementing it?
BRITTON: “Data flow analysis” is not our propriety name. It’s an industry name. But “deep-flow data flow analysis” is a term we use ourselves, and it comes from doing analysis on the static code to get the flow of the data through.
Our strength is our coding standards. We work from the point of view of educating the user into best practice, rather than merely pointing out bugs. We have an extensive help system that shows the user why one of these messages or errors has come up. Our in-depth knowledge comes from an extensive understanding of the language itself rather than just the code.
MIL EMBEDDED: How do you compare to, say, Coverity or LDRA?
BRITTON: Again, it’s the static code – we concentrate on being excellent at the static code analysis. And for dynamic analysis, we partner with VectorCAST so that we can both bring in what we’re good at.
MIL EMBEDDED: What is the difference between static analysis and dynamic analysis?
BRITTON: Well, the dynamic is more about the behavior of the code during execution, actually running the code. Static analysis is performed without running the code but identifies code that is potentially dangerous, overly complex, or difficult to maintain. For PRQA, again, it’s the fact that we look at the language and the constructs of the language rather than what happens when you put data in. Our tool works in a very similar way to the compiler itself, in that it preprocesses the source and can detect things such as infinite loops, signed integer flows, and so on. We also have this deep-flow data flow, which although it is not the same as our competitors, it is using some of the aspects that they use as well.
MIL EMBEDDED: What are some of the common coding errors that are difficult or impossible to detect?
BRITTON: Well, one is the infinite loop. An infinite loop might be intentional – perhaps if you have a while (1) loop or something you want to just continuously run in a particular task – but it also might not be intentional. Another one is signed integer overflow, which can lead to things such as divide by zero, causing the code to just not execute as expected.
Another thing not necessarily obvious is that the compiler is not infallible. Within our static analysis tool, we can find code errors that the compiler allowed through. This is where we come into the type checking, leading to the possible signed integer overflows and type misuses.
MIL EMBEDDED: Let’s talk about defense systems. How do defense applications differ from other applications?
BRITTON: There are three key issues. Firstly, the safety-critical aspect is very important with defense applications. Secondly, the timescale used on military projects is often much longer than any other embedded-type systems. For example, an automotive product might be on the market within two years, whereas military projects might be five or longer. So there is maybe a slower adoption of changes. And finally, there’s the whole issue of using legacy code.
MIL EMBEDDED: Let’s talk about each of these three scenarios you’ve described – safety-critical code and more use of legacy software combined with longer timescale.
BRITTON: For safety-critical systems, software must be reliable, tested, and infallible in the field. There are many ways of addressing this. One important thing is the process that is gone through [certification testing] before the code is released, like DO-178B. PRQA’s tools are qualified for use in DO-178B. DO-178C is coming out shortly, which is enhancing this field again. Static analysis is a very important part of this because it can remove problems before even getting to test and reduces the testing cycle, allowing concentration on more detailed and definite testing.
Relative to legacy code within the military environment, there’s often an attitude of “If it isn’t broken, don’t fix it.” The legacy code doesn’t undergo as rigorous standards testing as new code does – purely because standards didn’t exist when it was created; however, it can pretty much be taken onboard that the legacy code that has been in a product maybe for 20 years is reasonably likely to be safe.
MIL EMBEDDED: So what happens to the legacy code as far as static analysis goes?
BRITTON: So, with our static analysis tool we will exclude that code from the analysis on the basis that you may cause more trouble by making changes to that code than in actually fixing anything that’s found. We call this “baselining”: We take a snapshot and say, “From this point onward, anything new that goes in will be checked to this coding standard.” Obviously, legacy code can be brought in step by step. Very often this isn’t done purely because the working code that’s been safely used for several years, you don’t change. Traceability is always very, very important, too.
MIL EMBEDDED: Is this baselining you mentioned unique to your tool? Are we talking about 20-year-old C code or Pascal?
BRITTON: With this kind of concept, different companies call it different names. We only analyze C and C++, so if it’s Pascal code, we wouldn’t analyze it. We see a lot of C in military products, more than C++, although C++ is increasing in that world. With legacy code, if you want to go in and clean it up, the baseline can be moved back and allow more code to be changed. This can be risky and is not a particularly useful exercise.
MIL EMBEDDED: Some big trends in defense are multicore, partitioned RTOSs, and virtualization. How have these affected coding practices, and how do tools manage them?
BRITTON: We have dealt with multicore systems and we are currently working on improvements to the system working with multicore. Certainly, regarding the partitioned OS, our product basically runs where you put it and takes the code as presented to it. The whole goal – all the new trends, virtualization, everything – is to head toward safer, speedier, and easier-to-understand, top-level systems.
MIL EMBEDDED: Let’s polish the crystal ball: Which trends do you see going forward?
BRITTON: Well, I think there will be more software, more lines of code, more standards, and developments in language. Particularly in some of the U.S. universities, there’s a lot of interesting work going on with new programming languages that may be used in future embedded applications. Another trend is the cloud. I’m not sure how that will affect the embedded world, but it needs to be considered. And finally, I think there will be a transfer to more object-oriented languages using C++, rather than C.
PRQA 617-273-8448 www.programmingresearch.com