In [1]:
import numpy as np
import numpy.linalg as lin
import matplotlib.pylab as plt
import plotly.offline as py
import plotly.graph_objects as go

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

np.set_printoptions(precision=3, linewidth=150, suppress=True)
plt.style.use(['seaborn-whitegrid','data/cours.mplstyle'])

Let's calculate ${\bf x}^T \, A \, {\bf x} $ with Numpy¶

The goal is to plot the $J_A({\bf x})$ curve with ${\bf x} \in ℝ^2$ so that the poor little humans we are able to see something.

$$ J_A({\bf x}) = {\bf x}^T \, A \, {\bf x} $$

Later we will take ${\bf x} \in ℝ^n$.

For a value x this calculation is simple: x.T @A@ x but we want to do this calculation for a set of x.

We start by building a mesh, i.e. a set of ${\bf x}$ points for which we will calculate $J_A({\bf x})$. For this we use np.meshgrid. Let's see what it gives.

In [2]:
x = np.linspace(-1,1,3)
y = np.linspace(-1,2,4)

mesh = np.meshgrid(x,y) # donne les x puis les y du maillage
M = np.array(mesh)
print("shape of M:", M.shape)
print(M)
M = M.transpose([1,2,0])    # je préfère avoir un tableau 4x3 de points
print("\nshape of M:", M.shape)
M
shape of M: (2, 4, 3)
[[[-1.  0.  1.]
  [-1.  0.  1.]
  [-1.  0.  1.]
  [-1.  0.  1.]]

 [[-1. -1. -1.]
  [ 0.  0.  0.]
  [ 1.  1.  1.]
  [ 2.  2.  2.]]]

shape of M: (4, 3, 2)
Out[2]:
array([[[-1., -1.],
        [ 0., -1.],
        [ 1., -1.]],

       [[-1.,  0.],
        [ 0.,  0.],
        [ 1.,  0.]],

       [[-1.,  1.],
        [ 0.,  1.],
        [ 1.,  1.]],

       [[-1.,  2.],
        [ 0.,  2.],
        [ 1.,  2.]]])

I want to apply ${\bf x}^T \, A \, {\bf x}$ to each point of my mesh. I start with the identity matrix so that I can easily check the calculations.

First I compute ${\bf x}^T \, A$ for all points of the mesh.

Test case with A = 2 Id¶

I look on a very small mesh how to write the calculations. I take for A twice the identity matrix to be able to check the head calculations.

In [3]:
A = 2 * np.diag([1, 1])
MA = np.einsum("ijk, ka -> ija", M, A)  # voici un cas où la notation d'Einstein simplifie bien la vie
MA                                      # k est en double et pas dans le résultat, donc on somme sur lui
Out[3]:
array([[[-2., -2.],
        [ 0., -2.],
        [ 2., -2.]],

       [[-2.,  0.],
        [ 0.,  0.],
        [ 2.,  0.]],

       [[-2.,  2.],
        [ 0.,  2.],
        [ 2.,  2.]],

       [[-2.,  4.],
        [ 0.,  4.],
        [ 2.,  4.]]])

I find 2 M which is consistent.

In another case we could check some points:

In [4]:
# vérification sur un point particulier
M[0,1] @ A
Out[4]:
array([ 0., -2.])

Note that I wrote ${\bf x} \, A$ and not ${\bf x}^T \, A$ but when we have a vector Numpy adapts and favors the vector matrix product which gives a vector. Thus m[0,1] @A == m[0,1].T@ A because, as says the doc, "Transposing a 1-D array returns an unchanged view of the original array".

If we want to have a difference between a vertical vector and a horizontal vector we must use 2D array of size 1xn or nx1.

In [5]:
np.einsum("ijk, ijk -> ij", MA, M)   # comme k n'est pas dans le résultat, c'est sur lui qu'on fait la somme
Out[5]:
array([[ 4.,  2.,  4.],
       [ 2.,  0.,  2.],
       [ 4.,  2.,  4.],
       [10.,  8., 10.]])

As A is 2 times the identity, we must find for any point 2 times its norm squared. It's good.

A real case¶

We take a mesh of 100x100 and a random matrix A:

In [6]:
np.random.seed(2)
A = np.random.randint(-10,10, size=(2,2)) / 100
print(A)

x = np.linspace(-20,20,100)
y = np.linspace(-20,20,100)

mx, my = np.meshgrid(x,y) 
M = np.array(np.meshgrid(x,y)).transpose([1,2,0])

MA = np.einsum("ijk, ka -> ija", M, A)
mz = np.einsum("ijk, ijk -> ij", MA, M)
[[-0.02  0.05]
 [ 0.03 -0.02]]
In [7]:
trace = go.Surface(x=mx, y=my, z=mz)
fig = go.Figure(data=[trace])
py.iplot(fig, show_link=False)

We have a superb saddle point.

Of course with another random seed we would have something else. Imagine what you could have if all values ​​of A are positive.

Let's optimize¶

If we have to do this calculation many times in high dimension, it would be nice to find the solution optimal. While Einstein's notation helps here, it's not always the fastest. So let's compare with other methods.

Using a J function and a loop¶

In [8]:
def J(x):
    return x @ A @ x

mz2 = np.array([J(x) for line in M for x in line]).reshape([M.shape[0], M.shape[1]])

np.max(np.square(mz2 - mz))  # pour vérifier qu'on obtient la même chose
Out[8]:
0.0
In [9]:
# comparons les temps calculs :

%timeit np.einsum("ijk, ijk -> ij", np.einsum("ijk, ka -> ija", M, A), M)
143 µs ± 1.99 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [10]:
%timeit np.array([J(x) for line in M for x in line]).reshape([M.shape[0], M.shape[1]])
19.7 ms ± 494 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

We are more than 100 times faster with Einstein's summation.

Note that with apply_along_axis it is worse:

In [11]:
%timeit np.array([np.apply_along_axis(J, 1, line) for line in M]).reshape([M.shape[0], M.shape[1]])
34.6 ms ± 989 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using np.tensordot¶

A tensor is an N-dimensional matrix so here is a function that should be able to be used since our set of points is in 3 dimensions.

In [12]:
%%timeit                              # pour calculer le temps d'execution de toute la cellule %%

MA = np.tensordot(M, A, axes=(2,1))   # on somme sur l'axe 2 de M (les points) et l'axe 1 de A (les colonnes)
np.einsum("ijk, ijk -> ij", MA, M)    # je n'arrive pas à faire cette opération autrement
100 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

We have a gain of 30% compared to einsum.

To have good measurements it would be necessary to use cases bigger than 100x100 and to launch the tests on a machine that does just that, so in single user mode under Linux.

Conclusion¶

Roughly we have:

Method Time
einsum 140 µs
tensordot 100 µs
for loop 20,000 µs
In [ ]: