Metadata: When target video data is not enough

The pilot of a manned aircraft has the navigation equipment, the sensor inputs, and the smarts to know where the aircraft is and where the target is. The pilot can launch a missile against an enemy site or drop a bomb on a target and complete a mission with a relatively high degree of confidence. Multiple technologies augment the pilot’s situational awareness and exert command-and-control authority. Supplemental data overlaid on cockpit displays is useful but not critical to completing the mission.

What if the pilot is thousands of miles away from the target, flying the airplane with a joystick and scanning the terrain and the threats on a video screen? In today’s remote and virtualized combat scenarios, it is more difficult for fliers to understand the imagery they are seeing without metadata – that is, data about the data. The proliferation of unmanned air and ground vehicles is making the rapid encoding, compression, transmission, decoding, dissemination, and display of metadata every bit as important as the real-time streaming video.

In remotely flown missions, metadata can mean the difference between hitting the target and hitting something else. Sophisticated as today’s sensors are, they could be misleading or downright dangerous without some indication of the imagery’s larger context. At the other end of a thousand-mile loop, the decision maker looking at video of a suspected enemy vehicle moving down a road, for example, needs to know the vehicle’s heading in relation to other objects of interest. The operator also may want to know information such as the geographical coordinates of the image, the local time at the site, and the identification number of the sensor host.

While the video is the fundamental data – the sine qua non of the mission – its inherent ambiguities make it, by itself, insufficient to act upon. Simultaneously processed metadata – and with metadata, the more, the better – is key to obtaining desired outcomes.

Metadata can be as simple as the audio files accompanying video files, closed-captioning for the hearing-impaired on a television, subtitles on foreign films, or time and date notations on emails. Metadata can also be as demanding as real-time directional and source cues for streaming video footage from unmanned platforms. It can indicate the GPS location, the time and date, the orientation of the camera on the platform, the host vehicle’s altitude and airspeed, and much more. It can originate from within the video stream or from external sources, such as sensors, tracking devices, or other computers.

Processing challenges

Metadata has been used since the dawn of video and exists in a large number of video transmitting formats. The sheer variety of coding and decoding methods creates a challenge for metadata processing technology, however: How to capture and pass along all the formats from internal and external sources? Moreover, how to do this in both directions, both inserting and extracting metadata, with minimal latency while performing the primary task of video processing?

Most metadata processing engines today are tied to specific formats, such as the KLV (Key-Length-Value) standard for metadata insertion and extraction, NATO’s STANAG 4609, or Cursor on Target (CoT). (Efforts to unify around one format have not yet borne fruit.) Format-specific, hardware-based processing platforms add speed but don’t necessarily guarantee performance over time. Retroactively adding new capabilities can lead to costly redesigns in mid-cycle. Most metadata processing is also unidirectional, inserting or extracting this information rather than performing both operations simultaneously.

The embedded logic necessary to capture metadata of any type or length is also more complex than the coding required to lock onto specific, fixed formats and ignore everything else.

An example of this approach to metadata processing is the GE Intelligent Platforms ICS-8580, a ruggedized XMC video streaming module newly updated with a firmware-based metadata processing engine that is format-agnostic, bidirectional, and has throughput rates as fast as 500 Kbytes/sec, with two to eight Mbits/sec throughput for 100x-compressed video data (Figure 1).

Given today’s medley of coding/decoding algorithms, the best strategy may be to capture all types of metadata and let the military applications sort out which ones to translate and display. This approach allows future growth while insuring against costly refresh projects in midstream.

21
Figure 1: The rugged ICS-8580 video compression XMC processes metadata as well as video.
(Click graphic to zoom by 1.9x)

www.gedefense.com