By convention the numpy package is usually imported as np
import numpy as np
The basic building block is the ndarray class provided by the NumPy package.
np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
An ndarray is an n-dimensional array of values of a specific data type
np.array([1, 2, 3, 4], dtype=np.int64)
array([1, 2, 3, 4])
np.array([1.1, 2.2, 3.3, 4.4], dtype=np.float64)
array([ 1.1, 2.2, 3.3, 4.4])
np.array([1.1 + 2.2j, -3.3, 4.4j], dtype=np.complex64)
array([ 1.10000002+2.20000005j, -3.29999995+0.j , 0.00000000+4.4000001j ], dtype=complex64)
The shape of the ndarray describes the size of each dimension.
scalar = np.array(3)
print("Scalar", scalar.shape); print(scalar)
Scalar () 3
vector = np.array([1.0, 2.0, 3.0])
print("Vector", vector.shape); print(vector)
Vector (3,) [ 1. 2. 3.]
matrix = np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]])
print("Matrix", matrix.shape); print(matrix)
Matrix (3, 3) [[ 1. 2. 3.] [ 4. 5. 6.] [ 7. 8. 9.]]
tensor = np.array([[[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
[[5.0, 6.0, 7.0], [8.0, 9.0, 10.0]]]])
print("Tensor", tensor.shape); print(tensor)
Tensor (1, 2, 2, 3) [[[[ 1. 2. 3.] [ 4. 5. 6.]] [[ 5. 6. 7.] [ 8. 9. 10.]]]]
np.ones((2,2))
array([[ 1., 1.], [ 1., 1.]])
np.linspace(-1, 1, 21)
array([-1. , -0.9, -0.8, -0.7, -0.6, -0.5, -0.4, -0.3, -0.2, -0.1, 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
np.random.normal(size=(1000,3))
array([[ 0.55700821, 0.4070244 , 0.78926285], [ 0.77834488, -0.67959165, 0.30710418], [-0.75194435, 1.16922865, 0.16003823], ..., [ 0.68701632, 0.27558596, 0.29430687], [ 1.36337253, 0.46857553, -0.90007435], [-0.27374651, -0.67685141, 0.18652032]])
A ufunc operates on an ndarray in an element-by-element fashion
vector + vector
array([ 2., 4., 6.])
vector ** 2
array([ 1., 4., 9.])
np.sin(vector)
array([ 0.84147098, 0.90929743, 0.14112001])
np.sin(vector)**2 + np.cos(vector)**2
array([ 1., 1., 1.])
If you operate with a ufunc on ndarrays with different shapes, numpy will broadcast them for you.
https://docs.scipy.org/doc/numpy-1.10.0/user/basics.broadcasting.html
r = scalar + vector
print(scalar.shape, "+", vector.shape, "=", r.shape)
print(scalar, "+", vector, "=", r)
() + (3,) = (3,) 3 + [ 1. 2. 3.] = [ 4. 5. 6.]
r = vector + vector
print(vector.shape, "+", vector.shape, "=", r.shape)
print(vector, "+", vector, "=", r)
(3,) + (3,) = (3,) [ 1. 2. 3.] + [ 1. 2. 3.] = [ 2. 4. 6.]
r = matrix + vector
print(matrix.shape, "+", vector.shape, "=", r.shape)
print(matrix, "+", vector, "=\n", r)
(3, 3) + (3,) = (3, 3) [[ 1. 2. 3.] [ 4. 5. 6.] [ 7. 8. 9.]] + [ 1. 2. 3.] = [[ 2. 4. 6.] [ 5. 7. 9.] [ 8. 10. 12.]]
A ufunc can be used to reduce an ndarray by one dimension by applying it along one axis
print(vector)
np.add.reduce(vector)
[ 1. 2. 3.]
6.0
print(matrix)
np.add.reduce(matrix)
[[ 1. 2. 3.] [ 4. 5. 6.] [ 7. 8. 9.]]
array([ 12., 15., 18.])
np.add.reduce(matrix, axis=1)
array([ 6., 15., 24.])
A ufunc can be used to accumulate the result of applying it to all elements
print(vector)
np.add.accumulate(vector)
[ 1. 2. 3.]
array([ 1., 3., 6.])
print(matrix)
np.multiply.accumulate(matrix)
[[ 1. 2. 3.] [ 4. 5. 6.] [ 7. 8. 9.]]
array([[ 1., 2., 3.], [ 4., 10., 18.], [ 28., 80., 162.]])
Most of the time you will use predefined short-cuts and convinience functions
np.sum(vector)
6.0
np.cumsum(vector)
array([ 1., 3., 6.])
You can reshape, concatenate and split arrays
https://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html
np.reshape(matrix, (9,))
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9.])
np.vstack([matrix, vector])
array([[ 1., 2., 3.], [ 4., 5., 6.], [ 7., 8., 9.], [ 1., 2., 3.]])
np.vsplit(matrix, 3)
[array([[ 1., 2., 3.]]), array([[ 4., 5., 6.]]), array([[ 7., 8., 9.]])]
You can access individual elements or parts of ndarrays in various ways.
This is probably the most difficult part to get right in terms of performance.
https://docs.scipy.org/doc/numpy/reference/routines.indexing.html
print(vector[0], vector[1], vector[2])
1.0 2.0 3.0
print(vector[[True, False, True]])
[ 1. 3.]
print(matrix[:2, 1:])
[[ 2. 3.] [ 5. 6.]]
for x in vector:
print(x)
1.0 2.0 3.0
x = np.arange(1, 1000001)
%%timeit
mysum = 0
for element in x:
mysum += element
10 loops, best of 3: 128 ms per loop
%timeit sum(x)
10 loops, best of 3: 62.8 ms per loop
%timeit np.sum(x)
1000 loops, best of 3: 564 µs per loop
Consider two two-dimensional gaussian distributions for signal and background
N = 1000
signal = np.random.multivariate_normal([-1.0,-1.0],[[1.0,0.5],[0.5,1.0]],(N,))
bckgrd = np.random.multivariate_normal([1.0,1.0],[[1.0,-0.5],[-0.5,1.0]],(N,))
print(signal.shape, bckgrd.shape)
plot_data(signal, bckgrd)
(1000, 2) (1000, 2)
We calculate Fisher's discriminant: $$ \vec{d} = \frac{\vec{\mu}_S - \vec{\mu}_B}{\Sigma_S + \Sigma_B}$$
from numpy.linalg import inv
numerator = np.mean(signal, axis=0) - np.mean(bckgrd, axis=0)
denominator = np.cov(signal.T) + np.cov(bckgrd.T)
fisher = np.dot(inv(denominator), numerator)
plot_data(signal, bckgrd, fisher=fisher)
Feed Forward $$ a_h = \sum_i w_{hi} x_i $$ $$ x_h = \tanh{a_h}$$ $$ a_o = \sum_h w_{oh} x_h $$ $$ x_o = \tanh{a_o}$$
Backpropagation Algorithm
$$\frac{\mathrm{d}E}{\mathrm{d}w} = \frac{\partial E}{\partial x} \cdot \frac{\partial x}{\partial a} \cdot \frac{\partial a}{\partial w}$$Output Layer $$E \left( x_o \right) = \frac{1}{2} \left( x_o - t \right)^2 $$
Hidden Layer $$E \left( x_h \right) = E \left( x_o(x_h) \right)$$
class NeuralNetwork(object):
def __init__(self, n_input, n_hidden):
self.w_hi = np.random.normal(size=(n_input, n_hidden))
self.w_oh = np.random.normal(size=n_hidden)
def fit(self, x_i, t):
for step in range(1, 1000):
# Propagate input through hidden and output layer
a_h = np.dot(x_i, self.w_hi)
x_h = np.tanh(a_h)
a_o = np.dot(x_h, self.w_oh)
x_o = np.tanh(a_o)
# Back-Propagate error signal
d = lambda x: 4*np.cosh(x)**2/(np.cosh(2*x) + 1)**2
e_o = d(a_o)*(x_o - t)
e_h = d(a_h)*np.outer(e_o, self.w_oh)
# Update weights
self.w_oh -= 0.01 * np.dot(e_o.T, x_h)
self.w_hi -= 0.01 * np.dot(x_i.T, e_h)
def predict(self, x_i):
# Propagate input trough hidden layer
x_h = np.tanh(np.dot(x_i, self.w_hi))
# Propagate output of hidden layer through output layer
return np.tanh(np.dot(x_h, self.w_oh))
nn = NeuralNetwork(2, 4)
nn.fit(np.vstack([signal, bckgrd]),
np.hstack([np.ones(N), -np.ones(N)]))
plot_data(signal, bckgrd, neuralnet=nn)
Calculate $\pi$ using the formula of Leibniz $$ 4 \cdot \sum_{k=1}^\infty \frac{(-1)^{k+1}}{2 k - 1}$$
Check the central limit theorem by adding 100 uniformly distributed numbers between 0, 1. $$ y = \sum_{i=1}^{100} x_i$$
Do this 10000 times, standardise the obtained values $y' = \frac{y - \mu_y}{\sigma_y}$ and plot the histogram using
%matplotlib inline
import matplotlib.pyplot as plt
plt.hist(y_array)
Compare the distribution with 10000 random numbers directly drawn from np.random.normal.
Implement the sieve of Eratosthenes to find all the prime numbers up to 1000 without using any Python loops
Hint: My personal solution is a one-liner and uses the following three functions: