Coordinate Changes in Linear Algebra


Ancient Greek philosophers used to study mathematics, because mathematical thinking provided an ideal model of philosophical thought, free of the complications of hairier subjects in philosophy like ethics. Plato’s dialogue Meno, for example, uses a mathematical demonstration to probe the nature of knowledge. Although at times mathematics can seem like it has no connection to the real world, occasionally a deep understanding of some mathematical concepts can give clarity to our ordinary ways of thinking. One of the most beautiful examples of this comes from linear algebra.

To set the stage, we should first recall the phrase, “the map is not the territory”.

A map is not the territory it represents, but, if correct, it has a similar structure to the territory, which accounts for its usefulness.

— Alfred Korzybski

In the context of linear algebra, this concept is important for students to understand when they learn the difference between a vector and the coordinates of that vector. The reason students get confused about this in the first place is because usually the first vector spaces they are exposed to are vector spaces like \mathbb{R}^3, where vectors are usually written out with notation like \vec{v} = \begin{bmatrix} 1.5 \\ 2 \\ -1 \end{bmatrix}. In this special case, it is okay to identify the vector \vec{v} with a triple of real numbers, because for \mathbb{R}^3 there is a canonical coordinate system (due to the way \mathbb{R}^3 is constructed). In this special case, vectors really are their own coordinates (i.e., the map is the territory)!

When the Map is not the Territory

However, this breaks down once you consider other vector spaces like \mathbb{P}_2, the space of polynomials of degree at most two. A polynomial of degree at most two is a function which can be written in the form p(t) = at^2 + bt + c, with a, b, and c real numbers. It may be tempting to then identify p(t) with the triple of numbers \begin{bmatrix} a \\ b \\ c \end{bmatrix}, but this would be a mistake! For one, although it is possible to expand all polynomials of degree at most two in terms of t^2, t, and 1, they can also be expanded in terms of (t-1)^2, (t-1), and 1. If we take a polynomial like p(t) = 3t^2 + 4t + 1, we could re-express it, just as validly, as p(t) = 3(t-1)^2  + 10(t-1) + 8. Indeed, a little basic algebra shows that:

    \[3(t-1)^2 + 10(t-1) + 8 = (3t^2 - 6t + 3) + (10t - 10) + 8 = 3t^2 + 4t + 1\]

This means the same polynomial p(t), depending on which basis we expand it in terms of, could either be represented as \begin{bmatrix} 3 \\ 4 \\ 1 \end{bmatrix} or \begin{bmatrix} 3 \\ 10 \\ 8 \end{bmatrix} (among others). If Alice were to use the basis \{t^2, t, 1\} and Bob were to use the basis \{(t-1)^2, (t-1), 1\}, and both were to identify p(t) with its coordinate vector, Alice and Bob would arrive a contradiction by the chain of equalites:

    \[\begin{bmatrix} 3 \\ 4 \\ 1 \end{bmatrix} = p(t) = \begin{bmatrix} 3 \\ 10 \\ 8 \end{bmatrix}\]

Resolving the Contradiction

To resolve this apparent contradiction, all we have to do is recognize that different people may map the same territory in different ways. Alice might describe the polynomial p(t) using the coordinates “\begin{bmatrix} 3 \\ 4 \\ 1 \end{bmatrix}“, and Bob might describe p(t) using the coordinates “\begin{bmatrix} 3 \\ 10 \\ 8 \end{bmatrix}“. Both are equally valid descriptions of p(t), but they are descriptions in different descriptive frameworks (i.e., in different languages).

One way of formalizing this idea is to use a special notation for “description of an object with respect to a given descriptive framework”. If x is an object (like a polynomial), and \mathcal{A} is a descriptive framework (like Alice’s coordinate system), then we use the notation [x]_{\mathcal{A} to denote the description of object x in descriptive framework \mathcal{A}. In this notation, we can see how the contradiction above no longer goes through:

    \[\begin{bmatrix} 3 \\ 4 \\ 1 \end{bmatrix} = [p(t)]_{\mathcal{A}} \neq [p(t)]_{\mathcal{B}} =  \begin{bmatrix} 3 \\ 10 \\ 8 \end{bmatrix}\]

We have no more reason to believe that [p(t)]_{\mathcal{A}} (the description of p(t) in Alice’s descriptive framework) is the same as [p(t)]_{\mathcal{B}} (the description of p(t) in Bob’s descriptive framework) than we have to believe that the word used to describe shoes in English is the same as the word used to describe shoes in French. Of course, it may happen by sheer coincidence that the descriptions of an object in two different languages are the same, but this is not to be expected.

However, that doesn’t mean that the descriptions of objects in two different descriptive frameworks bear no relation at all. Indeed, if we consider the Alfred Korzybski quote above, if we know that \mathcal{A} has a similar structure to the territory, and \mathcal{B} has a similar structure to the territory, then \mathcal{A} and \mathcal{B} should have similar structure to each other!

Deriving a Translation Rule

One way to capture the similarity in structure between two descriptive frameworks is to describe a translation between them. Using Alice and Bob’s descriptive frameworks for polynomials of degree at most two from earlier, it turns out we can derive a straightforward algebraic translation from [p(t)]_{\mathcal{A}} to [p(t)]_{\mathcal{B}} for any polynomial p(t) of degree at most two. Let’s derive this now.

First, let’s suppose that [p(t)]_{\mathcal{A}} = \begin{bmatrix} a \\ b \\ c \end{bmatrix}. Then we can derive:

    \[p(t) = at^2 + bt + c = a\big((t-1)^2 + 2(t-1) + 1\big) + b\big((t-1) + 1\big) + c = a(t-1)^2 + (2a + b)(t-1) + (a + b + c)\]

But this means

    \[[p(t)]_{\mathcal{B}} = \begin{bmatrix} a \\ 2a + b \\ a +  b + c \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 1 & 1 & 1 \end{bmatrix}\begin{bmatrix} a \\ b \\ c \end{bmatrix}\]

Since p(t) was arbitrary, we thus derive the following translation rule for all polynomials p(t) of degree at most two:

    \[[p(t)]_{\mathcal{B}} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 1 & 1 & 1  \end{bmatrix}[p(t)]_{\mathcal{A}}\]

What we notice in this case is that there is a linear relationship between [p(t)]_{\mathcal{A}} and [p(t)]_{\mathcal{B}}. In the context of linear algebra, the translation rule between [p(t)]_{\mathcal{A}} and [p(t)]_{\mathcal{B}} is known as a change of coordinates, and it is described by an invertible matrix.

An Exercise to Test Your Understanding

Before reading on, try to answer the following question using your understanding of linear algebra and the Alfred Korzybski quote above:

Why is the relationship between [p(t)]_{\mathcal{A}} and [p(t)]_{\mathcal{B}} linear?

If you are stuck, here’s a hint: Alice and Bob’s coordinate systems are both linear coordinate systems. That is, [x+y]_{\mathcal{A}} = [x]_{\mathcal{A}} + [y]_{\mathcal{A}} and [cx]_{\mathcal{A}} = c[x]_{\mathcal{A}} (and likewise for \mathcal{B}).

Figured it out? If not, here’s the answer. The functions [\cdot]_{\mathcal{A}}: p(t) \mapsto [p(t)]_{\mathcal{A}} and [\cdot]_{\mathcal{B}}: p(t) \mapsto [p(t)]_{\mathcal{B}} are invertible linear maps. Thus the map [\cdot]_{\mathcal{B}} \circ [\cdot]_{\mathcal{A}}^{-1} : \mathbb{R}^3 \rightarrow \mathbb{R}^3 is linear (and invertible). Now tracing back definitions, \big([\cdot]_{\mathcal{B}} \circ [\cdot]_{\mathcal{A}}^{-1} \big) [p(t)]_{\mathcal{A}} = [\cdot]_{\mathcal{B}} \big( [\cdot]_{\mathcal{A}}^{-1}  [p(t)]_{\mathcal{A}} \big) = [\cdot]_{\mathcal{B}} (p(t)) = [p(t)]_{\mathcal{B}}, so [\cdot]_{\mathcal{B}} \circ [\cdot]_{\mathcal{A}}^{-1} is indeed the translation from \mathcal{A} descriptions/coordinates to \mathcal{B} descriptions/coordinates. In plain English: the relationship between [p(t)]_{\mathcal{A}} and [p(t)]_{\mathcal{B}} is linear because the relationships between [p(t)]_{\mathcal{A}} and p(t) and between [p(t)]_{\mathcal{B}} and p(t) are both linear.

Picturing What’s Going On

The following diagram explains what’s going on when we translate between \mathcal{A} and \mathcal{B} coordinates, and vice versa:

This diagram applies far more general than you might expect. If we replace \mathbb{P}_2 with an arbitrary domain (corresponding to a territory), and the two copies of \mathbb{R}^3 with arbitrary spaces of descriptions, then the blue formulas for translation between \mathcal{A} and \mathcal{B} descriptions are still valid, so long as [\cdot]_{\mathcal{A}} and [\cdot]_{\mathcal{B}} are bijective functions. Since [\cdot]_{\mathcal{A}} and [\cdot]_{\mathcal{B}} are maps from objects of the domain to descriptions, this is just another way of saying that every object in the domain has a unique \mathcal{A}-description (resp. \mathcal{B}-description), and every \mathcal{A}-description (resp. \mathcal{B} description) corresponds to a unique object in the domain. Although this is not true for natural languages (because sometimes descriptions fit more than one object, and some objects have many different descriptions), it is often true for artificial languages / descriptive frameworks.

What’s special in the case of linear algebra is just that these functions [\cdot]_{\mathcal{A}} and [\cdot]_{\mathcal{B}} are invertible linear maps, which implies all the arrows in this diagram are invertible linear maps.


Leave a Reply

Your email address will not be published. Required fields are marked *