## Monday, November 10, 2014

### Evaluating polynomials at regular-space grids

Remember the successive differences approach we learned in school to analyze series of numbers? Consider the set of 4th powers: 0, 1, 16, 81, 256, 625, 1296, 2401, 4096, 6561, ...
Finding the differences between successive terms we get another sequence:
1, 15, 65, 175, 369, 671, 1105, 1695, 2465, ...
Repeating this procedure, we eventually get:
0,    1,  16,  81,  256,  625,  1296,  2401,  4096,    6561, ...
1,  15,  65,  175, 369,   671,  1105,   1695,   2465, ...
14, 50, 110,  194,  302,   434,   590,     770, ...
36,  60,    84,   108,   132,  156,     180, ...
24,   24,    24,     24,     24,     24, ...

Each row is obtained by adding the numbers in the row below to obtain the next term. We notice that the fifth row is constant, i.e. we can obtain the fourth row by simply adding a constant to a previous term. This shows that we can compute each row simply by addition.  This approach is valid for any polynomial and is the essence of Nuttall's algorithm (Ref. [1]) to evaluate polynomials on a equally spaced grid. Consider the 3rd order polynomial $p(x) = x^3$ and we want to evaluate it at $x = x_0+n\cdot dx$ for $n=0,1,2, ...$. For simplicity (and without loss of generality, as we can always translate and scale the polynomial), assume that $x_0 = 0$ and $dx = 1$.  Using successive differences, we obtain:
$p(n) = p(n-1) + d_2(n)$ where $d_2(n) = 3n^2-3n+1$.
$d_2(n) = d_2(n-1) + d_1(n)$ where $d_1(n) = 6n-6$.
$d_1(n) = d_1(n-1) + d_0(n)$ where $d_0(n) = 6$.

The main idea is that $d_2(n) = p(n)-p(n-1)$ is a polynomial of a lower degree. We can then compute $d_i(n)$ iteratively from $d_i(n-1)$ by adding $d_{i+1}(n)$. Thus $p(n)$ can be computed using the $3$ additions above with the initial conditions $d_0(0) = 6, d_1(0) = -6, d_2(0) = 1, p(0) = 0$.

Given the initial conditions at $n=0$, for a $m$-degree polynomial, this algorithm will require $mk$ additions to compute $p(n)$ for $n = 1, ..., k$.

For the On-Line Encyclopedia of Integer Sequences (OEIS), there are several sequences where such polynomial evaluations are needed.  I have written several Python programs to compute these sequences using Nuttall's algorithm.  See for example: A161710A161715A235810A138091A075904, A243298, A007487.

Consider the interpolating polynomial for the divisors of 24 (OEIS A161710):
$\frac{-6n^7 + 154n^6 - 1533n^5 + 7525n^4 - 18879n^3 + 22561n^2 - 7302n+2520}{2520}$

Since the polynomial evaluates to integers for all integers $n$, given the initial conditions, Nuttall's algorithm evaluates this polynomial at integers $n=1,2,...$ using only integer arithmetic, without the need for multiplication or division:

 A161710_list, m = [1], [-12, 80, -223, 333, -281, 127, -23, 1] for _ in range(1, 10**3):     for i in range(7):         m[i+1] += m[i]     A161710_list.append(m[-1])

In Ref. [2], it was shown that the initial conditions take some computation and Nuttall's algorithm is more efficient than the direct approach in terms of the number of multiplication operations needed if the number of gridpoints is larger than $o(m^2)$ where $m$ is the degree of the polynomial. However, for the case above the savings in the number of multiplications is more substantial for 2 reasons.  First, the initial conditions can be precomputed and hardcoded as in the code above.
Second, in the above example only integer arithmetic is used and the numbers have increasing number of digits as $n\rightarrow\infty$. The initial conditions have smaller number of bits than the numbers that are multiplied in the direct approach to obtain $p(n)$ when $n$ is large and the number of multiplication operations needed on a fixed wordlength computer increases for larger numbers.

In Ref. [1], Nuttall also considered functions of the form $f(x) = e^{p(x)}$ where $p$ is a polynomial.  In this case $f(n)/f(n-1) = e^{p(n)-p(n-1)}$, so the successive differences in the polynomial becomes successive ratios in $f$ and the additions become multiplications.

Extensions of this algorithm to polynomials in multiple variables can be found in Ref. [3].

References:
[1] Albert H. Nuttall, "Efficient Evaluation of Polynomials and Exponentials of Polynomails for Equispaced Arguments," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, no. 10, pp. 1486-1487, October 1987.
[2] S. C. Datta Roy and Shailey Minocha, "A Note on "Efficient Evaluation of Polynomials and Exponentials of Polynomails for Equispaced Arguments",", IEEE Transactions on Signal Processing, vol. 39, no. 11, pp. 2554-2556, November 1991.
[3] N. K. Bose, "Efficient Evaluation of Multivariate Polynomials Over Multidimensional Grids," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, no. 10, pp. 1630-1631, November 1987.

## Thursday, October 2, 2014

### IBM Smarter Cities Challenge in Baton Rouge

I am spending 3 weeks in Baton Rouge, Louisiana participating in the IBM Smarter Cities Challenge. You can find the team blog here: http://smartercitieschallenge.wordpress.com/

Here is a view outside of the window of the office that we are working out of, with a view of the Mississippi river and the Horace Wilkinson bridge:

## Friday, June 6, 2014

### The Hoffman-Wielandt theorem

The following interesting theorem bounds the "distance" between the eigenvalues of normal matrices using the Frobenius matrix norm.

Theorem [Hoffman-Wielandt 1953]: Let $A$ and $B$ be normal matrices with eigenvalues $a_1,\cdots, a_n$ and $b_1,\cdots, b_n$, respectively.  There exists an ordering of the eigenvalues such that
$$\sum_i |a_i-b_i|^2 \leq \|A-B\|_{F}^2$$
where $\|\cdot\|_{F}$ is the Frobenius matrix norm.

Let us look at the proof of this result from Ref. [1].  Note that $\|A\|_{F}=tr(AA^{*})= tr(A^*A)$ is unitarily invariant, i.e. if $V$ and $U$ are unitary matrices, then $\|UAV\|_F = \|A\|_F$.  Since both $A$ and $B$ are normal, $A = VA_0V^*$ and $B = WB_0W^*$ where $V$ and $W$ are unitary matrices and $A_0$ and $B_0$ are diagonal matrices of the eigenvalues of $A$ and $B$ respectively.  By the unitarily invariant property of the Frobenius norm,  $$\begin{eqnarray}\|A-B\|_F &=& \|VA_0V^*-WB_0W^*\|_F \\ &=& \|A_0-V^*WB_0W^*V\|_F \\ &=& \|A_0-UB_0U^*\|_F\end{eqnarray}$$ where $U = V^*W$ is a unitary matrix.  Thus the Hoffman-Wielandt theorem states that among all unitary matrices $U$,  $\|A_0-UB_0U^*\|_F$ is minimized (not necessarily uniquely) when $U$ is equal to a permutation matrix $P$.  Now
$$\begin{eqnarray}\|A_0-UB_0U^*\|_F &=& tr((A_0-UB_0U^*)(A_0^*-UB_0^*U^*)) \\&=&tr(A_0A_0^*+B_0B_0^*)-tr(UB_0U^*A_0^*)\\&&-tr(A_0UB_0^*U^*) \end{eqnarray}$$
Since $UB_0U^*A_0^*$ and $A_0UB_0^*U^*$ are conjugate of each other, this means that
$$\|A_0-UB_0U^*\|_F = \sum_i |a_i|^2+|b_i|^2-2Re(tr(UB_0U^*A_0^*))$$
Next note that $tr(UB_0U^*A_0^*) = \sum_{ij} b_i\overline{a}_j u_{ij}\overline{u}_{ij}$ and thus
$Re(tr(UB_0U^*A_0^*)) = \sum_{ij} Re(b_i\overline{a}_j) x_{ij}$
where $x_{ij} = |u_{ij}|^2$.  Since the rows and columns of $U$ are orthonormal vectors, this implies that
$X = \{x_{ij}\}$ is a bi-stochastic matrix and minimizing  $\|A_0-UB_0U^*\|_F$ will result in a value no lower than minimizing the linear function $\sum_{ij} Re(b_i\overline{a}_j) x_{ij}$ over the convex set of all bi-stochastic matrices $X$.  By the definition of convexity, the value of a linear function of any point in the convex set is a convex combination of the values of the function evaluated at the vertices.  This implies that the minimum value of a linear function over a convex set will be attained at a vertex.  By the Birkhoff-von Neumann theorem, the set of vertices of  the convex set of bi-stochastic matrices is exactly the set of  permutation matrices and this means that the minimum occurs when $X$ is a permutation matrix, i.e. the minimum is also achieved for $\|A_0-UB_0U^*\|_F$ when $U$ is a permutation matrix.

[1] A. J. Hoffman and H. W. Wielandt, "The variation of the spectrum of a normal matrix," Duke Mathematical Journal, vol. 20, no. 1 (1953), pp. 37-40.

## Saturday, May 24, 2014

### Ramsey theory

Ramsey theory is a fascinating subfield of combinatorics.  It deals with the idea that you can find always find regular substructures in large enough arbitrary, not necessarily regular, structures.  It can be considered a generalization of the Pigeonhole Principle.  A simple case of the general Ramsey's theorem is the following statement: among any group of 6 people, there is either a group of 3 people who mutually know each other, or there is a group of 3 people who mutually do not know each other.  Couched in graph theory, this is equivalent to the statement that for any 2-coloring of of the complete graph on 6 vertices $K_6$, there exists a subgraph of 3 vertices that is monochromatic, i.e. all the edges in the subgraph are colored with the same color.

For this simple case, the proof is quite easy. Pick a person $A$ from the group.  If $A$ has $3$ or more friends, pick $3$ of them, say $B$,$C$ and $D$.  If $2$ people among these friends are friends with each other, then together with $A$ they would form a group of 3 mutual friends.  Otherwise $B$,$C$ and $D$ mutually do not know each other.
If $A$ has $2$ or less friends, then there are at least $3$ people who $A$ does not know and you have a similar argument.  Pick $3$ among them, if they are not mutual friends, then there are $2$ of them who are not friends, then these $2$ people along with $A$ form a group who mutually do not know each other.

## Saturday, May 10, 2014

### Triangle theorems

Most of the theorems about triangles that I read about date from antiquity.  For instance, Euclid's Elements contains many propositions such as
• The sum of the length of two sides of a triangle is larger than the length of the third side.
• An isosceles triangle has the two angles at the base equal to each other.
• Pythagoras Theorem
An interesting theorem is Heron's formula (probably known in Archimedes' time) for the area $A$ of a triangle with sides of lengths a, b,c:
$A = \sqrt{s(s-a)(s-b)(s-c)}$
where $s$ is the semiperimeter defined as $s = \frac{1}{2}(a+b+c)$.  This theorem allows you to calculate the area of a triangle using solely the lengths of the 3 sides.  A similar theorem was discovered by the Chinese independently from the Greeks.

Of more recent vintage is Routh's theorem, published in 1896, that describes the ratio between a triangle and another triangle generated via the intersection of 3 cevians.

I was pleasantly surprised to hear about 2 new theorems about triangles that have been proved only relatively recently.

1. Conway's little theorem (1976): A triangle is equilateral if and only if the ratio between the length of any 2 sides is rational and the ratio between any 2 angles is rational.
2. Grieser and Maronna 2013: Up to congruence, a triangle is determined uniquely by its area, perimeter and the sum of the reciprocals of its angles.
References:
• J. Conway, "A Characterization of the Equilateral Triangles and Some Consequences," The Mathematical Intelligencer, 20 March 2014.
• D. Grieser and S. Maronna, "Hearing the Shape of a Triangle," Notices of AMS, vol. 60, no. 11, pp. 1440-1447, 2013.

## Sunday, May 4, 2014

### Causality versus correlation

One of the challenges in data analysis is to determine whether correlated data implies causation.  For instance, sales of sunscreen and ice cream may be correlated during the year, but that does not mean that purchase of sunscreen leads to purchase of ice cream or vice versa.

Correlated data means then when one type of data occurs, so does another type of data and when one type of data is missing, so does the other type of data. From a logic point of view, consider two statements $P$ and $Q$ and we look at when they are true or false from observational data.  The presence or absence of these 2 statements being true can be collected in a truth table.  Let us assume that correlated data means the truth table has the following form (although the case of tautology below shows that this is not always the best assumption):

$P$$Q$$P$ and $Q$ are Correlated
001
01?
10?
111
i.e. we observe that $P$ and $Q$ are either both false or either both true.  There are 4 ways to fill in the 2 question marks above.

In mathematical logic, the notation $P\rightarrow Q$ is interpreted as P implies Q.  The only time this statement is false is if P is true, and Q is false.  Logically it is equivalent to $\neg P \vee Q$ and to its contrapositive $\neg Q \rightarrow \neg P$ and has the truth table:
$P$ $Q$ $P\rightarrow Q$
00 1
0 1 1
1 00
1 1 1

However, this must not be interpreted as P causing Q, as $P\rightarrow Q$ is still true if $P$ is false and $Q$ is true.  Just because the P being true resulted in Q being true, does not mean that Q cannot be true independent of P.  The lesson here is that the definition of "P implies Q" is different from our definition of "P causes Q".

The following truth table of $\neg(P\oplus Q)$, the exclusive-nor of P and Q, is a more appropriate way to capture the meaning of "$P$ causes $Q$".  This is also equivalent to the statement $P$ if and only if $Q$, i.e. $(P\rightarrow Q) \wedge (Q\rightarrow P)$, also written as $P\equiv Q$.

$P$$Q$$\neg(P\oplus Q)$
001
010
100
111

However, we have a dilemma here.  The same argument can be used to say that this truth table is appropriate to indicate that $Q$ causes $P$. This shows the difficulty of deriving causation solely from observational data.  To determine causality we need a method to apply control, say change the truth value of P and see how it affects the truth value of Q.

What about the other 2 ways to fill in the question marks? The following truth table corresponds to
$Q\rightarrow P$:

$P$$Q$$Q\rightarrow P$
001
010
101
111

Shows that $Q$ implies $P$, but again not necessary that $Q$ causes $P$.

The following truth table corresponding to a tautology, and indicates that P and Q are uncorrelated since
any combination of the truth values of P and Q are possible.