What is a tensor and why should I care?

Over time, the definition of a tensor has varied across communities from mathematics to quantum physics. Lately, it has joined the machine learning community’s lexicon. If you search the web for the definition of a tensor, you will likely be overwhelmed by the varying explanations and heated discussions. In 1900, Gregorio Ricci Curbastro and his student Tullio Levi-Civita first published their theory of tensor calculus, which is also known as absolute differential calculus.

The importance of tensor calculus became apparent in 1915 when physicist Albert Einstein revealed that he had found it indispensable for the gravitational field equations used in his theory of general relativity. From March to May of 1915, Einstein and Levi-Civita wrote to each other, their correspondence filled with complex mathematical equations, proofs, and counterproofs. All 11 letters that Einstein wrote to Levi-Civita have survived, while only one of Levi-Civita’s letters still exists. To honor Levi-Civita, the mathematical permutation symbol, εijk, used in tensor calculus, is known today as the Levi-Civita symbol.

One way to understand the importance of tensor calculus is to consider geometric complications when drawing right angles. If you are developing a system that uses the flat-earth model, you can draw right angles using the Pythagorean Theorem. The limits of the Pythagorean Theorem become clear when you try to draw a right angle on a spherical surface. In this case, the Pythagorean Theorem no longer works. It’s here that the metric tensor comes to the rescue. It generalizes coordinates and geometries so that distance can be measured in any given space. The magic of tensors comes from their special transformational properties that enable them to describe the same physics in all reference frames.

Think of a tensor as a multi-linear map. Given a set of coordinates (or expand out to functions or other objects), each of these coordinates can be transformed according to a set of rules (linear transformations) into a new set of coordinates. The key here is that each coordinate can have a unique transformation. For example, you can stretch or distort different coordinates in different ways. If we take a rectangular piece of bubble gum with edges on the x, y, and z-axes, and then squeeze the bubble gum on the x-axis (one-dimension input), the x dimension will compress a certain amount, while the y and z dimensions will expand a given amount. This results in output changes in three dimensions while maintaining a constant volume. Assuming a linearity of the squeezing reaction, the behavior can also be calculated using a metric tensor if the gum is squeezed off-axis.

Tensors are the most fundamental data type used in all major AI frameworks. A tensor is a container shaped to fit our data perfectly while defining the maximum size of the tensor. Take the example of a processor that has a temperature of 45 degrees. First, we need to ask if we are talking about degrees Centigrade or Fahrenheit. The complete answer will require a “denominate” number, which is a number with a name. In this case, the number 45 should be “named” as representing a certain number of degrees Centigrade. This is what is known as a scalar (or 0D) tensor, which is a tensor with zero dimensions. In contrast, a 1D tensor, better known as a vector, is implemented by an array that stores data in a single row or column. The best-known definition of a vector is an object that has both magnitude and direction. A more detailed description of a vector is a member of a space where:

  1. An operation exists that maps two elements in that space to another element in the space.
  2. An operation exists that maps an element in the space and a scalar to another point in that space.

The next step in the hierarchy is a matrix, which represents a two to an n-dimensional tensor. An example of a 3D tensor (or cube) is time series data used with radar processing, which has three parameters (time, frequency, and channel). Described by width, height, and depth (color), a two-dimensional JPG image can be expressed with a 3D tensor. Adding the number of pictures to process increases it to a 4D tensor. A collection of videos would be stored as a 5D tensor (number of videos, number of frames per video, width, height, and depth). As these images process through the deep learning layers, they can be broken down into hundreds of features, thus expanding the number of dimensions.

But beware: even though tensors and matrices appear similar, they are not the same. The following TensorFlow snippet will highlight the difference (Figure 1). NumPy is a general-purpose, array-processing package for N-dimensional arrays in Python. First, we create a 2×2 matrix and initialize its elements to powers of two. Likewise, we create a tensor to perform the same operation. While the output of the matrix is as expected, in contrast, the tensor output creates a computational graph, which serves as a roadmap to the final answer (we’ll discuss more about computational graphs in a later column). Evaluation of the resulting graph produces the expected answer.

Figure 1.

A tensor decomposition is unique whenever components are linearly independent, where a decomposition is a schema for expressing a tensor as elementary operations between simpler tensors. In contrast, a matrix decomposition is unique only when components are orthogonal. Compared to traditional matrix-based code, tensor-based modeling is faster and requires less memory space.

Tensor functions fall into one of four main categories: reshaping, element-wise operations, reduction, and access. Some of the tensor reshaping operations includes squeeze, unsqueeze, flatten, and reshape. Combining with another tensor will also reshape a tensor. Consider the similarity of reshaping the tensors in a deep learning model to the earlier chewing gum example!

Depending on your mathematical background, your definition and understanding of a tensor may vary. Some might even be disappointed with the lack of equations here. If so, please check out the NASA paper, “An Introduction to Tensors for Students of Physics and Engineering”, by Joseph C. Koleck.

Perhaps the best definition of a tensor comes from a regular poster on the website Ars Technica: “Basically, a tensor is a matrix of equations, instead of a matrix of pure numbers. Tensor mathematics is the manipulation of these equation matrices as a method of solving ALL of the involved equations.”

Serving where the compute engines, the algorithms, and the data all intersect, tensors are at the heart of deep learning, and as demonstrated, they easily represent high-order relationships. Tensors will often discover hidden relationships that a human did not see in the data and could not program as a feature. And like linear algebra, tensor algebra is parallelizable, which brings to mind Einstein’s advice: “Everything should be made as simple as possible, but not simpler.”