Monday, November 10, 2014

Evaluating polynomials at regular-space grids

Remember the successive differences approach we learned in school to analyze series of numbers? Consider the set of 4th powers: 0, 1, 16, 81, 256, 625, 1296, 2401, 4096, 6561, ...
Finding the differences between successive terms we get another sequence:
1, 15, 65, 175, 369, 671, 1105, 1695, 2465, ...
Repeating this procedure, we eventually get:
0,    1,  16,  81,  256,  625,  1296,  2401,  4096,    6561, ...
    1,  15,  65,  175, 369,   671,  1105,   1695,   2465, ...
       14, 50, 110,  194,  302,   434,   590,     770, ...
          36,  60,    84,   108,   132,  156,     180, ...
              24,   24,    24,     24,     24,     24, ...

Each row is obtained by adding the numbers in the row below to obtain the next term. We notice that the fifth row is constant, i.e. we can obtain the fourth row by simply adding a constant to a previous term. This shows that we can compute each row simply by addition.  This approach is valid for any polynomial and is the essence of Nuttall's algorithm (Ref. [1]) to evaluate polynomials on a equally spaced grid. Consider the 3rd order polynomial $p(x) = x^3$ and we want to evaluate it at $x = x_0+n\cdot dx$ for $n=0,1,2, ...$. For simplicity (and without loss of generality, as we can always translate and scale the polynomial), assume that $x_0 = 0$ and $dx = 1$.  Using successive differences, we obtain:
$p(n) = p(n-1) + d_2(n)$ where $d_2(n) = 3n^2-3n+1$.
$d_2(n) = d_2(n-1) + d_1(n)$ where $d_1(n) = 6n-6$.
$d_1(n) = d_1(n-1) + d_0(n)$ where $d_0(n) = 6$.

The main idea is that $d_2(n) = p(n)-p(n-1)$ is a polynomial of a lower degree. We can then compute $d_i(n)$ iteratively from $d_i(n-1)$ by adding $d_{i+1}(n)$. Thus $p(n)$ can be computed using the $3$ additions above with the initial conditions $d_0(0) = 6, d_1(0) = -6, d_2(0) = 1, p(0) = 0$.

Given the initial conditions at $n=0$, for a $m$-degree polynomial, this algorithm will require $mk$ additions to compute $p(n)$ for $n = 1, ..., k$.

For the On-Line Encyclopedia of Integer Sequences (OEIS), there are several sequences where such polynomial evaluations are needed.  I have written several Python programs to compute these sequences using Nuttall's algorithm.  See for example: A161710A161715A235810A138091A075904, A243298, A007487.

Consider the interpolating polynomial for the divisors of 24 (OEIS A161710):
$\frac{-6n^7 + 154n^6 - 1533n^5 + 7525n^4 - 18879n^3 + 22561n^2 - 7302n+2520}{2520}$

Since the polynomial evaluates to integers for all integers $n$, given the initial conditions, Nuttall's algorithm evaluates this polynomial at integers $n=1,2,...$ using only integer arithmetic, without the need for multiplication or division:

A161710_list, m = [1], [-12, 80, -223, 333, -281, 127, -23, 1]
for _ in range(1, 10**3):
    for i in range(7):
        m[i+1] += m[i]

In Ref. [2], it was shown that the initial conditions take some computation and Nuttall's algorithm is more efficient than the direct approach in terms of the number of multiplication operations needed if the number of gridpoints is larger than $o(m^2)$ where $m$ is the degree of the polynomial. However, for the case above the savings in the number of multiplications is more substantial for 2 reasons.  First, the initial conditions can be precomputed and hardcoded as in the code above.
Second, in the above example only integer arithmetic is used and the numbers have increasing number of digits as $n\rightarrow\infty$. The initial conditions have smaller number of bits than the numbers that are multiplied in the direct approach to obtain $p(n)$ when $n$ is large and the number of multiplication operations needed on a fixed wordlength computer increases for larger numbers.

In Ref. [1], Nuttall also considered functions of the form $f(x) = e^{p(x)}$ where $p$ is a polynomial.  In this case $f(n)/f(n-1) = e^{p(n)-p(n-1)}$, so the successive differences in the polynomial becomes successive ratios in $f$ and the additions become multiplications.

Extensions of this algorithm to polynomials in multiple variables can be found in Ref. [3].

[1] Albert H. Nuttall, "Efficient Evaluation of Polynomials and Exponentials of Polynomails for Equispaced Arguments," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, no. 10, pp. 1486-1487, October 1987.
[2] S. C. Datta Roy and Shailey Minocha, "A Note on "Efficient Evaluation of Polynomials and Exponentials of Polynomails for Equispaced Arguments",", IEEE Transactions on Signal Processing, vol. 39, no. 11, pp. 2554-2556, November 1991.
[3] N. K. Bose, "Efficient Evaluation of Multivariate Polynomials Over Multidimensional Grids," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, no. 10, pp. 1630-1631, November 1987.

Thursday, October 2, 2014

IBM Smarter Cities Challenge in Baton Rouge

I am spending 3 weeks in Baton Rouge, Louisiana participating in the IBM Smarter Cities Challenge. You can find the team blog here:

Here is a view outside of the window of the office that we are working out of, with a view of the Mississippi river and the Horace Wilkinson bridge:

Friday, June 6, 2014

The Hoffman-Wielandt theorem

The following interesting theorem bounds the "distance" between the eigenvalues of normal matrices using the Frobenius matrix norm.

Theorem [Hoffman-Wielandt 1953]: Let $A$ and $B$ be normal matrices with eigenvalues \(a_1,\cdots, a_n\) and \(b_1,\cdots, b_n\), respectively.  There exists an ordering of the eigenvalues such that
$$ \sum_i |a_i-b_i|^2 \leq \|A-B\|_{F}^2$$
where $\|\cdot\|_{F}$ is the Frobenius matrix norm.

Let us look at the proof of this result from Ref. [1].  Note that \(\|A\|_{F}=tr(AA^{*})= tr(A^*A)\) is unitarily invariant, i.e. if $V$ and $U$ are unitary matrices, then $\|UAV\|_F = \|A\|_F$.  Since both $A$ and $B$ are normal, $A = VA_0V^*$ and $B = WB_0W^*$ where $V$ and $W$ are unitary matrices and $A_0$ and $B_0$ are diagonal matrices of the eigenvalues of $A$ and $B$ respectively.  By the unitarily invariant property of the Frobenius norm,  $$\begin{eqnarray}\|A-B\|_F &=& \|VA_0V^*-WB_0W^*\|_F \\ &=& \|A_0-V^*WB_0W^*V\|_F \\
&=& \|A_0-UB_0U^*\|_F\end{eqnarray}$$ where $U = V^*W$ is a unitary matrix.  Thus the Hoffman-Wielandt theorem states that among all unitary matrices $U$,  $\|A_0-UB_0U^*\|_F$ is minimized (not necessarily uniquely) when $U$ is equal to a permutation matrix $P$.  Now
$$ \begin{eqnarray}\|A_0-UB_0U^*\|_F &=& tr((A_0-UB_0U^*)(A_0^*-UB_0^*U^*)) \\&=&tr(A_0A_0^*+B_0B_0^*)-tr(UB_0U^*A_0^*)\\&&-tr(A_0UB_0^*U^*)
\end{eqnarray} $$
Since $UB_0U^*A_0^*$ and $A_0UB_0^*U^*$ are conjugate of each other, this means that
$$ \|A_0-UB_0U^*\|_F = \sum_i |a_i|^2+|b_i|^2-2Re(tr(UB_0U^*A_0^*)) $$
Next note that $tr(UB_0U^*A_0^*) = \sum_{ij} b_i\overline{a}_j u_{ij}\overline{u}_{ij}$ and thus
$Re(tr(UB_0U^*A_0^*)) = \sum_{ij} Re(b_i\overline{a}_j) x_{ij}$
where $x_{ij} = |u_{ij}|^2$.  Since the rows and columns of $U$ are orthonormal vectors, this implies that
$X = \{x_{ij}\}$ is a bi-stochastic matrix and minimizing  $\|A_0-UB_0U^*\|_F$ will result in a value no lower than minimizing the linear function $\sum_{ij} Re(b_i\overline{a}_j) x_{ij}$ over the convex set of all bi-stochastic matrices $X$.  By the definition of convexity, the value of a linear function of any point in the convex set is a convex combination of the values of the function evaluated at the vertices.  This implies that the minimum value of a linear function over a convex set will be attained at a vertex.  By the Birkhoff-von Neumann theorem, the set of vertices of  the convex set of bi-stochastic matrices is exactly the set of  permutation matrices and this means that the minimum occurs when $X$ is a permutation matrix, i.e. the minimum is also achieved for $\|A_0-UB_0U^*\|_F$ when $U$ is a permutation matrix.

[1] A. J. Hoffman and H. W. Wielandt, "The variation of the spectrum of a normal matrix," Duke Mathematical Journal, vol. 20, no. 1 (1953), pp. 37-40.

Saturday, May 24, 2014

Ramsey theory

Ramsey theory is a fascinating subfield of combinatorics.  It deals with the idea that you can find always find regular substructures in large enough arbitrary, not necessarily regular, structures.  It can be considered a generalization of the Pigeonhole Principle.  A simple case of the general Ramsey's theorem is the following statement: among any group of 6 people, there is either a group of 3 people who mutually know each other, or there is a group of 3 people who mutually do not know each other.  Couched in graph theory, this is equivalent to the statement that for any 2-coloring of of the complete graph on 6 vertices $K_6$, there exists a subgraph of 3 vertices that is monochromatic, i.e. all the edges in the subgraph are colored with the same color.

For this simple case, the proof is quite easy. Pick a person $A$ from the group.  If $A$ has $3$ or more friends, pick $3$ of them, say $B$,$C$ and $D$.  If $2$ people among these friends are friends with each other, then together with $A$ they would form a group of 3 mutual friends.  Otherwise $B$,$C$ and $D$ mutually do not know each other.
If $A$ has $2$ or less friends, then there are at least $3$ people who $A$ does not know and you have a similar argument.  Pick $3$ among them, if they are not mutual friends, then there are $2$ of them who are not friends, then these $2$ people along with $A$ form a group who mutually do not know each other.

Saturday, May 10, 2014

Triangle theorems

Most of the theorems about triangles that I read about date from antiquity.  For instance, Euclid's Elements contains many propositions such as
  • The sum of the length of two sides of a triangle is larger than the length of the third side.
  • An isosceles triangle has the two angles at the base equal to each other.
  • Pythagoras Theorem
An interesting theorem is Heron's formula (probably known in Archimedes' time) for the area $A$ of a triangle with sides of lengths a, b,c:
\[ A = \sqrt{s(s-a)(s-b)(s-c)}\]
where $s$ is the semiperimeter defined as $s = \frac{1}{2}(a+b+c)$.  This theorem allows you to calculate the area of a triangle using solely the lengths of the 3 sides.  A similar theorem was discovered by the Chinese independently from the Greeks.

Of more recent vintage is Routh's theorem, published in 1896, that describes the ratio between a triangle and another triangle generated via the intersection of 3 cevians.

I was pleasantly surprised to hear about 2 new theorems about triangles that have been proved only relatively recently.

  1. Conway's little theorem (1976): A triangle is equilateral if and only if the ratio between the length of any 2 sides is rational and the ratio between any 2 angles is rational.
  2. Grieser and Maronna 2013: Up to congruence, a triangle is determined uniquely by its area, perimeter and the sum of the reciprocals of its angles. 
  • J. Conway, "A Characterization of the Equilateral Triangles and Some Consequences," The Mathematical Intelligencer, 20 March 2014.
  • D. Grieser and S. Maronna, "Hearing the Shape of a Triangle," Notices of AMS, vol. 60, no. 11, pp. 1440-1447, 2013. 

Sunday, May 4, 2014

Causality versus correlation

One of the challenges in data analysis is to determine whether correlated data implies causation.  For instance, sales of sunscreen and ice cream may be correlated during the year, but that does not mean that purchase of sunscreen leads to purchase of ice cream or vice versa.

Correlated data means then when one type of data occurs, so does another type of data and when one type of data is missing, so does the other type of data. From a logic point of view, consider two statements $P$ and $Q$ and we look at when they are true or false from observational data.  The presence or absence of these 2 statements being true can be collected in a truth table.  Let us assume that correlated data means the truth table has the following form (although the case of tautology below shows that this is not always the best assumption):

$P$$Q$$P$ and $Q$ are Correlated
i.e. we observe that $P$ and $Q$ are either both false or either both true.  There are 4 ways to fill in the 2 question marks above.

In mathematical logic, the notation $P\rightarrow Q$ is interpreted as P implies Q.  The only time this statement is false is if P is true, and Q is false.  Logically it is equivalent to $\neg P \vee Q$ and to its contrapositive $\neg Q \rightarrow \neg P$ and has the truth table:
$P$ $Q$ $P\rightarrow Q$
00 1
0 1 1
1 00
1 1 1

However, this must not be interpreted as P causing Q, as $P\rightarrow Q$ is still true if $P$ is false and $Q$ is true.  Just because the P being true resulted in Q being true, does not mean that Q cannot be true independent of P.  The lesson here is that the definition of "P implies Q" is different from our definition of "P causes Q".

The following truth table of $\neg(P\oplus Q)$, the exclusive-nor of P and Q, is a more appropriate way to capture the meaning of "$P$ causes $Q$".  This is also equivalent to the statement $P$ if and only if $Q$, i.e. $(P\rightarrow Q) \wedge (Q\rightarrow P)$, also written as $P\equiv Q$.

$P$$Q$$\neg(P\oplus Q)$

However, we have a dilemma here.  The same argument can be used to say that this truth table is appropriate to indicate that $Q$ causes $P$. This shows the difficulty of deriving causation solely from observational data.  To determine causality we need a method to apply control, say change the truth value of P and see how it affects the truth value of Q.

What about the other 2 ways to fill in the question marks? The following truth table corresponds to
$Q\rightarrow P$:

$P$$Q$$Q\rightarrow P$

Shows that $Q$ implies $P$, but again not necessary that $Q$ causes $P$.

The following truth table corresponding to a tautology, and indicates that P and Q are uncorrelated since
any combination of the truth values of P and Q are possible.


Thus just the fact that we always observe either $P$ and $Q$ both being true or both being false does not mean that they are correlated, it could be that we haven't observed the other combinations yet.

Saturday, May 3, 2014

Odd days in May

During lunch today, we were talking about how tomorrow, May 4, is Star Wars day, a play on words on the classical phrase in Star Wars, "May the force be with you" (May the fourth be with you), and is a day when Star Wars fans celebrate all things Star Wars.

As a kid, I have always been a fan of the Star Wars movies.  My brother and I were very much into special effects and very often we would go to the book store to buy magazines dedicated to the art of special effects in movies and we were members of the Star Wars fan club. We learned how the third movie was initially called "Revenge of the Jedi", but was changed later to "Return of the Jedi" because revenge seems uncharacteristic for a Jedi.  The very first movie that I saw in a movie theater in the US was "Return of the Jedi" (The original theatrical run, not the special edition in the late 1990's).  Many years later I introduced my wife to Star Wars by watching the special edition trilogy with her in the movie theater.

While pondering all this, I turned to my son and said, "Shouldn't any odd day in May (May 1, May 3, etc.) be called 'The Hunger Games Day'?".  There are 16 such days in May, so that is a lot of celebration.

Saturday, March 22, 2014

Peanut wrapper

I was on a flight some time ago and noticed the following on the wrapper of my bag of peanuts:

The second line seems kind of redundant, no?

Wednesday, March 19, 2014


I was talking to someone about the deposit we can get from soda cans and bottles when we returned them to the store for recycling and she remarked that the containers do a good job telling us that they are worth some money, because on the cans are written the words "ME 5¢".   Sometimes the messages are quite friendly; on one bottle it says "HI ME 5¢".

Sunday, March 16, 2014

Multiplication tricks

When I was a young boy I was shown the following trick to square certain numbers very quickly.  In particular, given a 2-digit number where the unit digit is 5, the square of this number can be easily computed with the following rule:

If the tens digit is $a$ and the unit digit is $5$, the square can be found by computing $a(a+1)$ and appending $25$ to the result.  

For instance $75^2$ is computed by multiplying $7$ and $8$ ($= 56$)and adding $25$ at the end to get $5625$.  This also work with $3$-digit numbers (and longer numbers).  For instance, to compute $115^2$, we compute $11\times 12 = 132$ and get $13225$.

To show why this is true is a simple exercise. A number with tens digit $a$ and unit digit $5$ is equal to $10a+5$.  Squaring this number results in $(10a+5)^2 = 100a^2 + 100a + 25 = 100a(a+1) + 25$ which corresponds to the rule above.

This same argument is true when multiplying two 2-digit numbers where the tens digits are equal and the unit digits add up to 10.  In this case we are multiplying $10a+b$ and $10a+c$ where $b+c =10$.  The product $(10a+b)(10a+c)$ is equal to $100a^2+100a+bc = 100a(a+1)+bc$. So the corresponding rule is:

Multiply $a$ with $a+1$ and append $bc$ at the end.  

The only caveat is that when $b=1$ and $c = 9$, you append $09$ (rather than just $9$) at the end. For instance to compute $83\times 87$, we get $8\times 9 = 72$ and append $3\times 7$ to get $7221$. Similarly for $124\times 126$, we compute $12\times 13 = 156$ and append $24$ to get $15624$.

You can use a similar trick to multiply two 2-digit number where the tens digits add up to 10 and the unit digits are the same, i.e. multiplying $10b+a$ with $10c+a$ where $b+c = 10$. The product is
$$(10b+a)(10c+a) = 100bc + 100a + a^2 = 100(bc+a) + a^2$$
Thus the corresponding rule is:

Multiply $b$ with $c$, add $a$ and then append $a^2$ to the result.

Again, if $a^2$ has only one digit, prefix it with a $0$ before appending it to $bc+a$.  For example, to compute $43\times 63$, we compute $4\times 6 + 3 = 27$ and append $3^2 = 09$ to obtain $2709$.

Sunday, March 9, 2014

Beyond superlatives

My son and I went skiing the other day and he asked me why double black diamond trails which are billed as "Extremely difficult" are more difficult than black diamond trails which are described as "Most difficult". He said, "How could there be something more difficult than 'most difficult'?"   Is "most difficult" more difficult or less difficult than "extremely difficult"? According to the New Oxford American Dictionary, "most" is defined as "greatest in amount or degree", whereas "extreme" is defined as "reaching a high or the highest degree".  So there is a sense where "extremely difficult" is the same as "most difficult", but there is an interpretation of the definitions where "extremely difficult" is less difficult than "most difficult".  Perhaps the wording in the ski resorts should be changed to more clearly indicate which trails are more difficult.

Wednesday, March 5, 2014

Fractals in a fairy tale

We were all watching the Oscars last Sunday and my son and I noticed that the song "Let it go" from the movie Frozen mentions fractals.  Fractal is a mathematical term and we were quite surprised but delighted to see it used in a children's song.  One of the most famous fractals is the Mandelbrot set, named after Benoit Mandelbrot, the father of fractals.  When I got my Commodore VIC-20 in the 1980's, my brother and I (and I am sure countless other kids playing with their personal computers) wrote a program to compute the Mandelbrot set, and zoom in and display various sections of it, (not knowing that 15 years later, I will work at the same place where Mandelbrot worked and have a chance to have lunch with him on several occasions).  At the time, it takes hours to compute a picture of the Mandelbrot set.

The definition of the Mandelbrot set is quite simple.  Points on the plane correspond to complex numbers and each point is used as a parameter $c$ in the equation $x_{n+1}= x_n^2 + c$.  The iterates $x_i$ starting from $x_0 = 0$ remain bounded if and only if $c$ belongs to the Mandelbrot set.  Since it may take many iterations before the iterates start to diverge, practically the computer program assumes the iterates are bounded if it hasn't diverged after a large number of iterations.