import numpy as np # np is the convention
Unlike lists, tables have all their values of the same type.
Creating an array¶
a = np.array([[1,2,3], [4,5,7]]) # 2D array built from a list
print(a)
print("shape: ",a.shape)
print("type: ",type(a[0,0]))
[[1 2 3] [4 5 7]] shape: (2, 3) type: <class 'numpy.int64'>
dtype: the choice of weapons¶
When doing scientific calculation it is important to choose the type of tables to optimize memory, errors, speed.
NumPy offers the following types:
Data type | Description |
---|---|
bool | Boolean (True or False) stored as a byte |
int | Platform integer (usually either int32 or int64) |
int8 | Byte (-128 to 127) |
int16 | Integer (-32768 to 32767) |
int32 | Integer (-2147483648 to 2147483647) |
int64 | Integer (9223372036854775808 to 9223372036854775807) |
uint8 | Unsigned integer (0 to 255) |
uint16 | Unsigned integer (0 to 65535) |
uint32 | Unsigned integer (0 to 4294967295) |
uint64 | Unsigned integer (0 to 18446744073709551615) |
float | Shorthand for float64. |
float16 | Half precision float: sign bit, 5 bits exponent, 10-bit mantissa |
float32 | Single precision float: sign bit, 8 bits exponent, 23-bit mantissa |
float64 | Double precision float: sign bit, 11 bits exponent, 52-bit mantissa |
complex | Shorthand for complex128. |
complex64 | Complex number, represented by two 32-bit floats (real and imaginary components) |
complex128 | Complex number, represented by two 64-bit floats (real and imaginary components) |
x = np.arange(4, dtype= np.uint8) # arange is the numpy version of range to produce an array
print("x =", x, x.dtype)
x[0] = -2 # 0 - 2 = max -1 for unsigned int
print("x =", x, x.dtype)
y = x.astype('float32') # conversion
print("y =", y, y.dtype)
x = [0 1 2 3] uint8 x = [254 1 2 3] uint8 y = [254. 1. 2. 3.] float32
We can know the memory size (in bytes) occupied by an element of the array:
print(x[0].itemsize)
print(y[0].itemsize)
y.nbytes
1 4
16
Predefined methods¶
There is also the possibility to create an empty or predefined array of the dimension of your choice using predefined creation methods:
a = np.empty((2,2), dtype=float) # empty do not set any value, it is faster
print("Empty float:\n", a)
print("Float zeros:\n", np.zeros((2,2), dtype=float)) # matrix filled with 0
print("Complex ones:\n", np.ones((2,3), dtype=complex)) # matrix filled with 1
print("Full of 3.2:\n", np.full((2,2), 3.2))
print("La matrice suivante est affichée partiellement car trop grande : ")
print(np.identity(1000)) # identity matrix
Empty float: [[2.42350504e-316 0.00000000e+000] [4.94065646e-324 nan]] Float zeros: [[0. 0.] [0. 0.]] Complex ones: [[1.+0.j 1.+0.j 1.+0.j] [1.+0.j 1.+0.j 1.+0.j]] Full of 3.2: [[3.2 3.2] [3.2 3.2]] La matrice suivante est affichée partiellement car trop grande : [[1. 0. 0. ... 0. 0. 0.] [0. 1. 0. ... 0. 0. 0.] [0. 0. 1. ... 0. 0. 0.] ... [0. 0. 0. ... 1. 0. 0.] [0. 0. 0. ... 0. 1. 0.] [0. 0. 0. ... 0. 0. 1.]]
print("Random integers < 10:\n", np.random.randint(10, size=(3,4))) # can also choose a min
print("Random reals between 0 and 1 :\n", np.random.random(size=(3,4)))
Random integers < 10: [[7 5 7 2] [6 7 0 8] [6 3 8 3]] Random reals between 0 and 1 : [[0.45228105 0.39674551 0.02142443 0.14342572] [0.95223097 0.85798264 0.85777533 0.28784271] [0.5142878 0.36823048 0.87704756 0.44096786]]
We can choose the law of distribution (uniform law by default).
loc = 3
scale = 1.5
np.random.normal(loc, scale, size=(2,3)) # Gauss distribution
array([[1.44216124, 2.5586318 , 4.28770816], [2.16814785, 1.82332334, 4.53031302]])
By redefining its shape¶
A classic case for testing is to create a small multidimensional array with different values.
For this the simplest is to put 0,1,2,...,N in the boxes of our form table (3,4) for example.
This is done with arange
which we have already seen to generate the values and reshape
to have the desired form:
arr = np.arange(3*4).reshape((3,4))
arr
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
arr[1, 0]
4
Attention, the numbering of the boxes of a table is carried out with nested loops. It is always the last dimension which varies most quickly. In 3D this means that going from one element to the next varies the last index (along z). This does not correspond to the human way of filling a cube (we tend to stack 2D arrays). In regular use this is not a problem, it is only weird when printing tests to check.
human Numpy
┌─┬─┐ ┌─┬─┐
┌─┬─┐│5│ ┌─┬─┐│3│
│0│1│┼─┤ │0│2│┼─┤
├─┼─┤│7│ ├─┼─┤│7│
│2│3│┴─┘ │4│6│┴─┘
└─┴─┘ └─┴─┘
A = np.arange(8).reshape(2,2,2) # a[0,1,0] is 2 and a[0,1,1] is 3
A
array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]])
The reverse of reshape is flatten()
which reshape a multi-dimensional array into a 1-dimensional array. We can also use flat
to have a 1D view on the array without reshaping it.
print(A.flat[5])
print(A.shape)
5 (2, 2, 2)
Mixing values¶
If we want to work with the values of an array taken in a random order then we can mix
the table with np.random.permutation()
. Attention the permutation is carried out on the elements of the first level of the array, A[:]
, i.e. arrays
if the array is multi-dimensional (an array of arrays).
data = np.arange(12).reshape(3,4)
np.random.permutation(data) # permutes lines only
array([[ 4, 5, 6, 7], [ 8, 9, 10, 11], [ 0, 1, 2, 3]])
If you want to mix all the values you have to flatten the table and give it back its shape:
data = np.random.permutation(data.flatten()).reshape(data.shape) # it works because flatten returns a copy
data
array([[10, 3, 11, 8], [ 4, 1, 2, 7], [ 0, 9, 6, 5]])
With its own function¶
We can also create an array with a function that gives the value of the array for each index (i, j):
def f(i,j):
return 2*i - j
np.fromfunction(f, shape=(3,4), dtype=int)
array([[ 0, -1, -2, -3], [ 2, 1, 0, -1], [ 4, 3, 2, 1]])
Basic Operations¶
Numpy makes it possible to apply the usual operations to all the elements of the array:
A = np.array([[1,2], [3,4]])
print("A + 1:\n", A + 1, '\n')
print("2 A + Id:\n", 2 * A + np.identity(2), '\n')
print(u"A * A (element-wise product):\n", A * A, '\n') # or np.square(A)
print(u"A @ A (matrix or dot product):\n", A @ A) # or A.dot(A)
A + 1: [[2 3] [4 5]] 2 A + Id: [[3. 4.] [6. 9.]] A * A (element-wise product): [[ 1 4] [ 9 16]] A @ A (matrix or dot product): [[ 7 10] [15 22]]
You can transpose an array with .T
. It does nothing with a 1D array, to make a horizontal vector or a vertical vector, you have to write it in 2D:
v = np.array([[1,3,5]])
print(v, '\n\n', v.T, '\n')
print("Guess what v + v.T means:\n", v + v.T)
[[1 3 5]] [[1] [3] [5]] Guess what v + v.T means: [[ 2 4 6] [ 4 6 8] [ 6 8 10]]
Trigonometric functions and other usual mathematical functions are also available.
np.set_printoptions(precision=3) # set printing precision for reals
np.sin(A)
array([[ 0.841, 0.909], [ 0.141, -0.757]])
Finally Numpy offers a set of methods to perform calculations on array elements:
sum()
to sum all elementsmean()
to get the average of the elements andaverage()
to get the weighted average,prod()
to multiply all elements,min()
andmax()
to get the minimum value and the maximum value,argmin()
andargmax()
to have the array indices of the minimum and maximum value,cumsum()
andcumprod()
for cumulative addition and multiplication,diff()
to get the gap with the next element (useful to calculate a derivative).
Each A.sum()
method also exists as a np.sum(A)
function.
A.argmax()
3
np.diff(A.flatten())
array([1, 1, 1])
Browse an array¶
The natural way to browse all the elements of a multi-dimensional array is to make a loop for each dimension:
a = np.arange(6).reshape(3,2)
for ligne in a:
for element in ligne:
print(" ", element, end="") # end="" avoid the return after each print
print()
0 1 2 3 4 5
We saw in the manipulation of the form of an array that we can make it flatter, but rather than using flatten()
which makes an array in 1 dimension, we prefer to use flat
which gives an iterator to iterate through all the elements of the array.
for v in a.flat:
print(v)
0 1 2 3 4 5
It is also possible to make a loop on the indices but it is clearly less powerful.
for i in range(len(a)):
for j in range(len(a[i])):
print(a[i,j])
0 1 2 3 4 5
Think vector¶
Making loops is often our way of thinking but it is not effective. It is better to to work directly on the whole array. So rather than making the loop
for i in range (len (x)):
z[i] = x[i] + y[i]
we will do
z = x + y
Not only is it more readable but it is also faster.
def double_loop(a):
for i in range(a.shape[0]):
for j in range(a.shape[1]):
a[i,j] = np.sqrt(a[i,j]) # change in-place
def iterate(a):
for x in a.flat:
x = np.sqrt(x) # modification not saved in a, x is a local var
b = np.random.random(size=(200,200))
a = b.copy() # we need a copy to be sure to use the same data each time
%timeit double_loop(a)
a = b.copy()
%timeit iterate(a)
a = b.copy()
%timeit np.sqrt(a) # vectorial operation, 1000 times faster
45.5 ms ± 494 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 35.3 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 36.9 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)