Skip to content
Snippets Groups Projects
Commit 03cceb64 authored by Lars Sowa's avatar Lars Sowa
Browse files

Add KNN handson & google/doc questions

parent 798dbc7a
Branches
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Introduction to Numpy
%% Cell type:markdown id: tags:
Numpy is an open source package for multidimensional, mathematical operations in Python. By assuming that you have basic python knowledge, this Jupyter notebook will give you a short introduction to Numpy as well as an overview of important functions that you will need later for machine learning tasks. Nearly all functions shown in this notebook are analogously available in the differential programming package PyTorch, implemented as tensor operations.
Never forget that nearly every library has documentation, e.g. https://numpy.org/doc/stable/reference/index.html, which you should use to get more information.
This Notebook is based on:
https://cs231n.github.io/python-numpy-tutorial/#containers
https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d
%% Cell type:markdown id: tags:
# Introduction to Numpy
%% Cell type:markdown id: tags:
### Python Lists
A python list is a collection of multiple variables. In general these do not have to be of the same data type. You can access a list element by the `[]` braces. The following gives you an overview about the most important commands.
%% Cell type:markdown id: tags:
Create a list and access a specific element.
%% Cell type:code id: tags:
``` python
xs = [3, 1, 2]
xs
```
%% Cell type:code id: tags:
``` python
xs[2]
```
%% Cell type:markdown id: tags:
Negative indices count from the end of the list.
%% Cell type:code id: tags:
``` python
xs[-1]
```
%% Cell type:markdown id: tags:
Print the length of a list.
%% Cell type:code id: tags:
``` python
len(xs)
```
%% Cell type:markdown id: tags:
Lists can contain elements of different types.
%% Cell type:code id: tags:
``` python
xs[2] = 'foo'
xs
```
%% Cell type:markdown id: tags:
Add a new element to the end of the list.
%% Cell type:code id: tags:
``` python
xs.append('bar')
xs
```
%% Cell type:markdown id: tags:
Remove the last element of the list.
%% Cell type:code id: tags:
``` python
x = xs.pop()
x, xs
```
%% Cell type:markdown id: tags:
##### Hands-on
Given a a empty list `x`, write a loop over the next integer numbers, test if it is a prime number and append it to the list f it is so. Collect 40 prime numbers.
%% Cell type:code id: tags:
``` python
x = []
# your code goes here
```
%% Cell type:markdown id: tags:
#### Slicing
In machine learning, you often have to slice lists or arrays. This can be done by the `:` operator, which selects 'everything in between'.
%% Cell type:markdown id: tags:
`range` is a built-in function that creates a list of integers:
%% Cell type:code id: tags:
``` python
nums = list(range(5))
nums
```
%% Cell type:markdown id: tags:
Get different slices from `nums`.
%% Cell type:code id: tags:
``` python
print(nums[2:4])
print(nums[2:])
print(nums[:2])
```
%% Cell type:markdown id: tags:
Get a slice of the whole list and one without the last element using negative indices.
%% Cell type:code id: tags:
``` python
print(nums[:])
print(nums[:-1])
```
%% Cell type:markdown id: tags:
Assign a new sub-list to a slice.
%% Cell type:code id: tags:
``` python
nums[2:4] = [8, 9]
nums
```
%% Cell type:markdown id: tags:
## Numpy
%% Cell type:markdown id: tags:
### Arrays
A Numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array: the shape of an array is a tuple of integers giving the size of the array along each dimension.
We can initialize Numpy arrays from nested Python lists, and access elements using square brackets:
%% Cell type:code id: tags:
``` python
import numpy as np
```
%% Cell type:markdown id: tags:
Create a rank 1 array check its shape:
%% Cell type:code id: tags:
``` python
a = np.array([1, 2, 3])
a.shape
```
%% Cell type:markdown id: tags:
Access single elements in the array and change an element.
%% Cell type:code id: tags:
``` python
a[0], a[1], a[2]
```
%% Cell type:code id: tags:
``` python
a[0] = 5
a
```
%% Cell type:markdown id: tags:
Same with an rank 2 array.
%% Cell type:code id: tags:
``` python
b = np.array([[1,2,3],[4,5,6]])
b[0, 0], b[0, 1], b[1, 0]
```
%% Cell type:code id: tags:
``` python
b.shape
```
%% Cell type:code id: tags:
``` python
b[0, 1] = 7
b
```
%% Cell type:markdown id: tags:
There are easy ways to initialize basic arrys, like arrays consisting of zeros or ones
There are easy ways to initialize basic arrys, like arrays consisting of zeros or ones. Use google.com or the Numpy documentation to find these methods!
%% Cell type:code id: tags:
``` python
a = np.zeros((2,2))
b = np.ones((3,3))
a, b
# your code goes here
```
%% Cell type:markdown id: tags:
Given array `a`, can you find the function to create an array of zeros (ones) with the same shape as `a`?
%% Cell type:code id: tags:
``` python
np.ones_like(a), np.zeros_like(b)
a = np.random.rand(4,2,3)
# your code goes here
```
%% Cell type:markdown id: tags:
##### Hands-on
Numpy arrays are slicable with the same operations we have seen for lists. Try to use them on an array of rank 3. How can you access the values along an axis (like a vector)?
%% Cell type:code id: tags:
``` python
shape = (3, 4, 5)
arr = np.arange(1, np.prod(shape) + 1).reshape(shape)
print(arr)
# your code goes here
```
%% Cell type:markdown id: tags:
#### Boolean indexing (Masking)
Boolean indexing is a fast way to access elements fulfilling a boolean condition.
%% Cell type:code id: tags:
``` python
arr = np.array([[1,2], [3, 4], [5, 6]])
arr
```
%% Cell type:markdown id: tags:
Find the elements of a that are bigger than 2: following snipped returns a Numpy array of booleans of the same shape as a, where each slot of bool_idx tells whether that element of a is > 2.
%% Cell type:code id: tags:
``` python
bool_idx = (arr > 2)
bool_idx
```
%% Cell type:markdown id: tags:
We use boolean array indexing to construct a rank 1 array consisting of the elements of a corresponding to the True values of `bool_idx`.
%% Cell type:code id: tags:
``` python
arr[bool_idx]
```
%% Cell type:markdown id: tags:
We can do all of the above in a single concise statement:
%% Cell type:code id: tags:
``` python
arr[arr > 2]
```
%% Cell type:markdown id: tags:
To invert a boolean value you can use the `~` operator:
%% Cell type:code id: tags:
``` python
arr[~bool_idx]
```
%% Cell type:markdown id: tags:
To combine masks you can use standard operations...
%% Cell type:code id: tags:
``` python
is_odd = arr%2 == 1
arr[is_odd * bool_idx]
```
%% Cell type:markdown id: tags:
... or build in functions ....
%% Cell type:code id: tags:
``` python
arr[np.logical_and(is_odd, bool_idx)]
```
%% Cell type:markdown id: tags:
or logical operations.
%% Cell type:code id: tags:
``` python
arr[is_odd & bool_idx]
```
%% Cell type:markdown id: tags:
The advantage of boolean indexing is to opperate fast on a subset of your array.
%% Cell type:code id: tags:
``` python
arr[is_odd] += 2
arr
```
%% Cell type:code id: tags:
``` python
arr[is_odd] = 0
arr
```
%% Cell type:code id: tags:
``` python
arr[~is_odd].sum()
```
%% Cell type:markdown id: tags:
##### Hands-on
You have a list of `names` and a second list of the same shape with the age `ages` of the corresponding persons.
How old is `Maksymilian Winter`?
Which persons have age `None`? Calculate the mean age of all persons, except the ones with age `Ǹone`. After that, set the age of these persons to the previously calculated mean.
Now try to identify all persons with an age between 28 and 32, or have an age divisible by 3.
%% Cell type:code id: tags:
``` python
names = np.array([['Vanessa Chapman', 'Fannie Silva', 'April Tucker', 'Jade Harrell', 'Corey Hansen', 'Alissa Lynch', 'Brogan Hamilton', 'Tamara Carver', 'Natasha King', 'Zoe Duran'],
['Calum Weeks', 'Archibald Arnold', 'Chris Fields', 'Mohsin Chase', 'Karim Reeves', 'Leighton Trevino', 'Maksymilian Winter', 'Erin Kennedy', 'Ifan Mcconnell', 'Jasper Carver']])
ages = np.array([[29, 25, 33, 20, 36, None, 32, 44, 29, 10],
[47, None, 30, 37, 36, 15, 29, 47, 27, 31]])
# your code goes here
```
%% Cell type:markdown id: tags:
#### Mathematical Operations on Arrays
%% Cell type:markdown id: tags:
All common mathematical operations are implemented for Numpy arrays. Note that these are elementwise:
%% Cell type:code id: tags:
``` python
a = np.array([[1,2], [3, 4], [5, 6]])
b = np.array([[7, 8], [9, 10], [11, 12]])
a, b
```
%% Cell type:code id: tags:
``` python
a + b
```
%% Cell type:code id: tags:
``` python
a * b
```
%% Cell type:markdown id: tags:
... and so on. Matrix multiplication can be done with `@`. For matching shapes, we first have to transpose `b` by applying the transpose operation `.T`.
%% Cell type:code id: tags:
``` python
a @ b.T
```
%% Cell type:markdown id: tags:
The dot product can be performed with `np.dot()`. Do this with the two columns of `a`:
%% Cell type:code id: tags:
``` python
np.dot(a[:,0] , a[:,1])
```
%% Cell type:markdown id: tags:
### Reshaping Numpy Arrays
While creating and working with complex arrays, often their shape must be adjusted so that it fits to your task.
%% Cell type:code id: tags:
``` python
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
```
%% Cell type:markdown id: tags:
A simple, but effective method is `arr.reshape()`.
%% Cell type:code id: tags:
``` python
arr.reshape(2, 6)
```
%% Cell type:markdown id: tags:
If you want to transform an array of higher rank to rank one:
%% Cell type:code id: tags:
``` python
arr.flatten()
```
%% Cell type:markdown id: tags:
Adding an axis to the array to increase its rank:
%% Cell type:code id: tags:
``` python
arr[:,None].shape
```
%% Cell type:markdown id: tags:
And removing a specific axis after adding two axis.
%% Cell type:code id: tags:
``` python
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr = arr[:,None, None]
arr.squeeze(1).shape
```
%% Cell type:markdown id: tags:
### Matrix manipulation of Arrays
%% Cell type:code id: tags:
``` python
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
```
%% Cell type:code id: tags:
``` python
arr.shape
```
%% Cell type:markdown id: tags:
Delete elements along column 0 and 1 along axis 1 (column).
%% Cell type:code id: tags:
``` python
np.delete(arr, [0,1], axis=1)
```
%% Cell type:markdown id: tags:
Insert zero-elements for row 0 and 1 along axis 0 (rows).
%% Cell type:code id: tags:
``` python
np.insert(arr, [0,1], 0, axis=0)
```
%% Cell type:markdown id: tags:
Swap axes 0 and 1 of the array.
There is also a function to swap the axes of `arr`, can you find it in the Python documentation/using google? Swap axes 0 and 1!
%% Cell type:code id: tags:
``` python
np.swapaxes(arr, 1,0)
# your code goes here
```
%% Cell type:markdown id: tags:
Multiple arrays can also be combined in different ways.
%% Cell type:code id: tags:
``` python
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
```
%% Cell type:markdown id: tags:
Stack two arrays along axis 1.
%% Cell type:code id: tags:
``` python
np.stack([arr, arr], axis=1)
```
%% Cell type:markdown id: tags:
Stack two arrays along axis 1.
%% Cell type:code id: tags:
``` python
np.concatenate([arr, arr], axis=0)
```
%% Cell type:markdown id: tags:
The difference between `np.stack()` and `np.concatenate()` can be seen better if you have a look at the output shapes. Do this and compare them!
%% Cell type:markdown id: tags:
## Hands-on: Covid
You are in a room surrounded by other persons, but you are feeling a little bit sick.
Unfortunately you just got a mail from your covid test: You are positiv!
(See plot below)
- If you assume that you might infect your 10 nearest neigbours (aka KNN with K=10), which persons would that affect?
- If you consider to spread the virus within a radius of 3 meters, which persons would that affect?
Test your solution with different seeds.
Do not hesitate to ask google for help!
Bonus: Have a look at the plotting tutorial and visualize your results in the plot below!
%% Cell type:code id: tags:
``` python
np.random.seed(42)
you = np.array([0,0])
others = np.random.randn(35, 2) * 4
```
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
plt.plot(you[0], you[1], 'o', color='red', label='you')
plt.plot(others[:,0], others[:,1], '.', color='black', label='other persons')
plt.xlabel("x in meters", fontsize=16)
plt.ylabel("y in meters", fontsize=16)
plt.legend()
```
%% Cell type:code id: tags:
``` python
# your code goes here
```
%% Cell type:markdown id: tags:
## What you should have learned from this notebook:
- What is the rank of an array?
- Dropping and appending elements on lists
- Creating Numpy arrays with `np.ones()` etc.
- Slicing arrays with `:`
- Acsessing arrays with boolean indixing
- Reshaping arrays
- Add and delete axes
- Concatenate and stacking arrays
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment