import numpy as np
import numpy.linalg as lin
import matplotlib.pylab as plt
import plotly.offline as py
import plotly.graph_objects as go
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
np.set_printoptions(precision=3, linewidth=150, suppress=True)
plt.style.use(['seaborn-whitegrid','data/cours.mplstyle'])
Optimization problem¶
An optimization problem looks like this:
Given a function $J : ℝ^n \rightarrow ℝ$, find the minimum of $J$ i.e. find ${\bf u}$ such that
$$J({\bf u}) = \inf_{{\bf v} \in ℝ^n} J({\bf v})$$
Optimization problem with constraint¶
It is possible to search for u not in $ℝ^n$ but in a part of $ℝ^n$, then it is a constrained optimization problem.
Example We are in 2D and we are looking for the minimum of $J(x,y)$ but with the constraint that $y > x$. This is equivalent to looking in the part of $ℝ^2$ that matches $y > x$.
We will not look at optimization problems with constraints in this course but they are problems important that you will see in your schooling.
The gradient method¶
One way to solve a 2D optimization problem is to imagine it as terrain with relief. The function $J(x,y)$ represents the altitude at any point $(x,y)$.
To find the minimum of $J$ just start from a random point and go down in the direction that goes down the most. A drop of water follows this path.
This is the gradient method.
To fully understand the method, it is necessary to fully understand the partial derivatives as well as the notations which are summarized in this memo on partial derivatives.
The gradient algorithm is therefore:
- take a random starting point ${\bf p^0} = (x_0, y_0)$
- calculate the gradient of J at this point
$$ \nabla J(x_0, y_0) = \begin{bmatrix} \frac{\partial J}{\partial x} \\ \frac{\partial J}{\partial y} \end{bmatrix} (x_0, y_0) $$
- move in the opposite direction (the grandiant goes up): ${\bf p}^{k+1} = {\bf p}^k - \mu \, \nabla J({\bf p}^k)$
and we repeat this last step until we arrive at a fixed point, i.e. that $|| {\bf p}^{k+1} - {\bf p}^k|| < \varepsilon$ with $\varepsilon$ a very small value.
def J(x, y):
return x**2 + 0.5 * y**2 - 2 * x + 3
x = np.linspace(-3,3,100)
y = np.linspace(-3,3,100)
mx, my = np.meshgrid(x,y)
mz = J(mx, my)
trace = go.Surface(x=mx, y=my, z=mz)
layout = go.Layout(scene = {'aspectmode':'data'})
fig = go.Figure(data=[trace], layout=layout)
py.iplot(fig, show_link=False)
def grad_J(x,y):
return np.array([2*x-2, y]) # calculé à la main à partir de J (on aimerait que cela soit automatique)
If you don't understand $J$'s gradient calculation, look no further. Find your course on this subject, look at the memo, ask questions.
# Algo du gradient pour trouver le minimum
x = np.array([0,0]) # un point au hasard, changez pour voir
µ = 0.1 # plus il est petit et moins on avance vite, cf formule ci-dessus
e = 0.0001 # mon epsilon pour la condition d'arrêt
while True:
x_old = x
x = x - µ * grad_J(*x) # *x donne en arguments toutes les valeurs de x donc x[0] en 1er arg et x[1] en 2e
if np.square(x_old - x).sum() < e**2:
break
print("Le minimum est au point ", x, "La valeur de J en ce point est ", J(*x))
Le minimum est au point [1. 0.] La valeur de J en ce point est 2.0000001053122918
I invite you to move your figure to be above, zoom in to see what will happen and now move the mouse over the figure. You should see an ellipse that shows the level line (all points having the same z value). The minimum is when this ellipse is only a point. It is precisely in [1, 0, 2].
Study of the convergence of the gradient¶
We store all the values of the points between our initial point and the solution to be able to draw curves of convergence.
So the result of our gradient method must be a set of points.
def minimum_J(start_value, µ=0.1, e = 0.001):
x = [np.array(start_value)]
while True:
x.append(x[-1] - µ * grad_J(*x[-1]))
if np.square(x[-1] - x[-2]).sum() < e**2:
break
# la suite n'est que des tests pour se protéger
if np.square(x[-1] - x[-2]).sum() > 1E6: # au cas où on diverge
print("DIVERGE")
break
if len(x) > 200: # c'est trop long, je crains la boucle infinie
print('Trop long, boucle infinie ?')
break
return np.array(x)
x = minimum_J(start_value = (0,1)) # je prends une valeur initiale qui n'est pas alignée avec la solution
plt.plot(x[:,0], x[:,1], 'x:')
[<matplotlib.lines.Line2D at 0x7fbc5e968c50>]
fig = go.Figure(data=[go.Scatter3d(x=x[:,0], y=x[:,1], z=J(x[:,0], x[:,1]), marker={'size':3})])
fig.show()
Nice, isn't it?
Impact of µ¶
Let's see how µ influences the convergence i.e. once we have the direction of the greatest slope at a point how much should we advance. With a large µ we take big steps, with a small µ we take small steps.
With µ = 0.1 we have taken small steps.
# µ = 2
x = minimum_J(start_value = (0,1), µ = 2)
plt.plot(x[:,0], x[:,1], 'x:')
DIVERGE
[<matplotlib.lines.Line2D at 0x7fbc5c93f9b0>]
fig = go.Figure(data=[go.Scatter3d(x=x[:,0], y=x[:,1], z=J(x[:,0], x[:,1]), marker={'size':3})])
fig.show()
Not good, we diverged. The steps are too big. We're going in the right direction but so far that we go back on the other side of the bowl.
We can see that in y we go from 1 to -1 then to 1, -1 etc without ever stopping to 0 which is the solution for y.
Question Why does the value of x explode instead of like y?
# µ = 0.8
x = minimum_J(start_value = (0,1), µ = 0.8)
plt.plot(x[:,0], x[:,1], 'x:')
[<matplotlib.lines.Line2D at 0x7fbc5c54e080>]
fig = go.Figure(data=[go.Scatter3d(x=x[:,0], y=x[:,1], z=J(x[:,0], x[:,1]), marker={'size':3})])
fig.show()
We converged but it's less pretty than the first time. On the other hand
len(x)
17
we converged in 17 iterations. Let's see how many iterations it took for µ = 0.1:
x = minimum_J(start_value = (0,1), µ = 0.1)
len(x)
46
µ = 0.8 gives a less attractive curve but we converge almost 3 times faster than with µ = 0.1. So it's better.
# µ = 1
x = minimum_J(start_value = (0,1), µ = 1)
plt.plot(x[:,0], x[:,1], 'x:')
Trop long, boucle infinie ?
[<matplotlib.lines.Line2D at 0x7fbc5ec1cc88>]
fig = go.Figure(data=[go.Scatter3d(x=x[:,0], y=x[:,1], z=J(x[:,0], x[:,1]), marker={'size':3})])
fig.show()
We started from [0.1] to go to [2.0] then to [0.0] then to [2.0].
As we have seen that from [2.0] we go to [0.0] this means that we oscillate infinitely between [0.0] and [2.0].
Knowing that the solution is [1,0] this means that we are going in the right direction but we do a step twice too big each time.
Morality The value of µ is important. If it is too small we waste time, if it is too big we cannot find the solution.