Data indexes boost embedded software's performance and efficiency

Use of a data index can improve an embedded application’s lookup speed an order of magnitude over linear search. The best-known general-purpose index is the B-Tree, but specialized indexes such as R-Trees and Patricia tries are ideal for many diverse application functions such as mapping and IP filtering.

To find information in a book, what makes more sense: reviewing every page for the sought-after content, or consulting the index, to discover exactly where to look?

Surely the latter makes more sense, and should be equally smart. Today’s embedded applications manage increasing volumes of complex data. Finding the right data – whether to route network packets, calculate distance to a point on a map, or achieve some other processing goal – typically must be achieved under real-time deadline pressure.

Fortunately, programmers can use data indexes to improve data lookup performance logarithmically over linear search. Use of indexes has increased as off-the-shelf Database Management Systems (DBMSs) have grown to play a larger role in embedded systems development, with most embedded databases supporting one or more index type.

However, in many projects, the first and often the only index considered is the B-Tree. This is because among the large existing variety of indexes, only B-Tree indexes are offered universally. There is no denying B-Trees’ efficiency for basic search operations like exact match, prefix, and range searches. However, for purposes as varied as IP routing, mapping, and query-by-form algorithm development, less common index types can be a better fit, resulting in faster performance and more efficient use of processing resources such as CPU cycles (see Sidebar 1).

21
Sidebar 1: B-Trees are not the only index choice for embedded applications.
(Click graphic to zoom by 2.4x)

The following scenarios illustrate how two of the less common index types – the R-Tree and the Patricia trie – are used.

Spatial indexing and the R-Tree

Mapping and other location-based functions are common in mobile and embedded applications ranging from battle support systems in combat vehicles, to mobile phone apps for finding the nearest pizza joint. Such applications are based on algorithms that perform spatial searches such as locating the nearest object to a current location, finding all objects near the user, and so on.

B-Tree indexes are single-dimensional: They can’t handle the R2 or R3 coordinates used in spatial searches. A much better alternative is the R-Tree, also named Guttman’s R-Tree after its inventor. It maps objects in space using a “bounding box.” If an object is represented by point coordinates (X, Y), then its bounding box is a degenerated rectangle in which the width and height are one.

For all other geographical objects (lines, polygons, and other shapes), the bounding box is such that the top-left corner coordinates are smaller than or equal to the coordinates of any point of the object, and the coordinates of the bottom-right corner are greater than or equal to the coordinates of any point of the object. In other words, a wrapping rectangle is the smallest rectangle that can fully contain a given object.

R-Tree indexes are commonly used to speed spatial searches (to discover the rectangle that bounds a given point, find all rectangles that overlap a given rectangle, and so forth). The application can manage all manner of shapes with help from the bounding box.

Using eXtremeDB’s database definition language, a spatial object can be described as follows:

/* structure representing point on the map */

struct Point

{

double latitude;

double longitude;

};

class Street {

/* vector of points specifying the street */

vector<Point> points;

/* Street wrapping rectangle */

rect<double> wrap_rect;

rtree<wrap_rect> streets_idx;

};

If we want to add a new street, the application must store the street’s coordinates and calculate its bounding box:

Street street;

mco_rect_t wr;

wr.l.x = DOUBLE_MAX;

wr.l.y = DOUBLE_MAX;

wr.r.x = DOUBLE_MIN;

wr.r.y = DOUBLE_MIN;

Street_new(trans, &street);

Street_points_alloc(&street, n_points);

for (i = 0; i < n_points; i++)

{

if (points[i].latitude < wr.l.latitude) {

wr.l.latitude = points[i].latitude;

}

if (points[i].longitude < wr.l.longitude) {

wr.l.longitude = points[i].longitude;

}

if (points[i].latitude > wr.r.latitude) {

wr.r.latitude = points[i].latitude;

}

if (points[i].longitude > wr.r.longitude) {

wr.r.longitude = points[i].longitude;

}

Street_points_put(&street, i, &points[i]);

}

Street_wrap_rect_put(&street, &wr);

If a user searches for a location, the mapping application presents the result (streets) in a window that corresponds to a map rectangle with coordinates |<min_longitude, min_latitude, max_longitude, max_latitude>|.

mco_rect_t r;

mco_cursor_t c;

MCO_RET rc;

r.l.x = min_longitude;

r.l.y = min_latitude;

r.r.x = max_longitude;

r.r.y = max_latitude;

if (Street_streets_idx_search(trans, MCO_EQ, &c, (double*)&r) == MCO_S_OK)

{

for (; rc == MCO_S_OK; rc = mco_cursor_next(trans, &c))

{

Street street;

Street_from_cursor(trans, &c, &street);

// display it

}

}

Patricia trie

The B-Tree can locate keys with a specified prefix, for example, finding businesses with names that start “AAA.” But some applications must search for keys that represent the longest prefixes of a specified value. The B-Tree can accomplish this, but only by performing several iterations and searching for different prefixes of the specified value, starting from the longest.

A much more efficient index for prefix searches is the Practical Algorithm to Retrieve Information Coded in Alphanumeric trie (from retrieval) also known as Patricia trie. A Patricia trie is a variation of a binary tree and typically is used for phone routing and IP filtering. In the first case, given an incoming call and a table of operators with known prefixes, an application must select the right operator to receive the call. In the second case, given IP masks for valid/rejected domains, a received HTTP request should be classified as accepted, rejected, redirected, and so on. The following database schema defines a routing table, with a mask represented by a vector of bits (Booleans).

class Route

{

Vector<bool> dest;

uint4 gateway;

uint4 interf;

uint2 metric;

unique patricia<dest> by_dest;

};

To locate the proper route for the received IP address, the application searches eXtremeDB as follows, using a Patricia trie:

mco_cursor_t csr;

if (MCO_S_OK == Route_by_dest_index_cursor(trans, &csr)) {

uint1 mask[4];

make_mask(mask, ip, 32);

/* find routes which mask match this IP address */

if (MCO_S_OK == Route_by_dest_prefix_match(trans, &csr, mask, 32);

Route route;

Route_from_cursor(trans, &csr, &route);

...

}

}

The following code converts the integer number representing the IP address into an array of bits:

void make_mask(uint1* mask, uint4 val, int bitnum)

{

int i;

val = val >> (32-bitnum);

memset(mask, 0, 4);

for (i = 0; i < bitnum; i++, val = val >> 1)

{

mask[i >> 3] |= (val&1) << (i&7);

}

}

The more index choices, the better the results

Knowledge of specialized indexes such as the R-Tree and the Patricia trie enables faster development, more efficient code, and the ability to work with more complex data structures. While the well-known B-tree is sufficient for many “plain vanilla” searching tasks, less well-known indexes can do a better job for specific application and data types: The R-Tree is highly efficient in handling mapping and geospatial data. Additionally, the Patricia trie is ideal for phone routing, IP filtering, and other tasks that must be accomplished by locating keys that represent the longest prefixes of a specified value.

Steve Graves is cofounder and CEO of McObject, a company specializing in embedded Database Management System (DBMS) . Prior to McObject, Steve was president and chairman of Centura Solutions Corporation and vice president of worldwide consulting for Centura Software Corporation (NASDAQ: CNTR). He also served as president and chief operating officer of Raima Corporation. He can be contacted at steve.graves@mcobject.com.

Konstantin Knizhnik is a software engineer at McObject, where he is involved in the development of the eXtremeDB embedded database system. He is also the author of several popular open-source projects including the Java-native database management systems Perst and Perst Lite, as well as tools that extend the capabilities of programming languages including Java, C++, and C#. Konstantin can be reached at knizhnik@mcobject.com.

McObject 425-888-8505 www.mcobject.com

Topics covered in this article