Data-centric mediation tools resolve the legacy source-code conundrum

At the 2012 Interoperable Open Architecture event Stan Schneider, CEO of RTI, stated that the sole imperative to control software cost is to establish a stable team working on a single code base. Government acquisition policies and contracts have assumed that source code equaled control of portability, extensibility, and interoperability. Unfortunately, control is an illusion. The reality is that the DoD now has millions of lines of code, divorced from their original architects and developers, that the government seeks to reuse – and it often costs more to reuse than to develop from scratch.

Gartner estimates to manually migrate (change language or change hardware platform) one single Source Line of Code (SLOC) costs between $6-$26 at a rate of 160 SLOC/day. For example, the government’s Blue Force Tracker program had 1.5M legacy SLOC. At this rate, a manual migration would cost between $9M-$39M and take 9,375 days to complete. Clearly these are unsustainable figures in the context of today’s constrained fiscal environment. However, a methodology supported by data-centric mediation tools can combat the challenge.

What’s in the code

A code reuse exercise begins with an analysis of which elements can or should be reused. Next, the new engineer(s) need to understand what the code is really doing both explicitly and implicitly, not just what it says it does in the documentation. This can be a costly and error prone exercise. Each time a code base is reused by a new development team, they undertake this exercise again, and often make the same, or new mistakes. This is why stable teams are extremely important. However, teams cannot always be kept stable, which begs the question: How do we avoid this relearning exercise and make code reuse a viable, cost-effective tool in the armory of acquisition and life-cycle cost reduction?

The short answer is to rigorously formalize the code reuse process and accumulate the understanding of how to reuse the code base in a computer consumable form.

Assuming no functional change in requirements, legacy code reuse needs an approach that can assess a subsystem or application, modularize it, and ideally reuse it with little, or no changes to its software: In effect, treat it as a black box and accomplish integration through mediation and not changes to the application’s code and logic.

What’s in the messages

All government software programs incorporate Interface Control Documents (ICDs) that define the structure of data that drives/informs a system, or is out-put by a system. Unfortunately, this is insufficient for interoperability across systems and couples an application to the implementation-specific characteristics such as timing, implied and expected behavior, implicit knowledge of state transition, and so on.

What’s in the data

An alternative approach is to characterize a software subsystem through its data (structure, context, and behavior), leverage the ICDs, and then reuse the legacy code base. In other words, rather than thinking about the code and how to reuse it, simplify the problem by looking at the system data in the context of operational use, characterize it, and then isolate the legacy code base through a data-centric interface to accomplish reuse and integration through mediation. Because this approach captures the semantics (understanding) of the unspecified elements of how the code actually operates, once, and it is representable in a rigorously defined computer readable format, this removes the relearning problem described earlier. The understanding of a specific function/application within a code base can be built on over time as more and more of the original code base is reused.

The way forward

Leading defense programs, such as the DoD’s Unmanned Control Segment (UCS), the Open Group’s Future Airborne Capability Environment (FACE) initiative, the Army’s Common Operating Environment (COE), and the Navy’s Open Systems Architecture (OSA) are all seeking to ensure code reuse in the future by starting with this type of semantic data model in the system architecture and software infrastructure, which will mitigate future needs to semantically characterize legacy modules of code. But because ICD syntax does not dictate semantics, and vice versa, the lessons learned can be applied in building these software architectures to legacy code repositories and semantic data models can be added to represent understanding of how legacy code actually works in its black box. This can be accomplished with data-centric mediation tools, such as RTI Routing Service 3.0, to facilitate integration into new systems. Ideally, these tools should be based on open standards like Data Distribution Service (DDS).

Gordon Hunt is Chief Applications Engineer at RTI. He can be contacted at