Skip to content
Snippets Groups Projects
Commit 6eea8dea authored by Lars Sowa's avatar Lars Sowa
Browse files

minor improvements

parent a7f532ac
Branches
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# PyTorch: Optimizing Model Parameters
Now that we have a model and data, it's time to train, validate and test our model by optimizing its parameters on
Now that we have a model and data, it's time to train, val our model by optimizing its parameters on
our data.
Training a model is an iterative process; in each iteration the model makes a guess about the output, calculates
the error in its guess (*loss*), collects the derivatives of the error with respect to its parameters (as we saw in
the previous tutorial), and **optimizes** these parameters using gradient descent.
Let's start by defining our model.
%% Cell type:code id: tags:
``` python
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
class ConvolutionalNeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.modulesss = nn.Sequential(
self.layers = nn.Sequential(
# sizes correspond to: (batch, channels, height, width)
# input (batch, 1, 28, 28)
nn.Conv2d(in_channels=1, out_channels=3, kernel_size=3), # -> (batch, 3, 26, 26)
nn.MaxPool2d(3, stride=2), # -> (batch, 3, 12, 12)
nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3), # -> (batch, 3, 10, 10)
nn.MaxPool2d(3, stride=3), # -> (batch, 3, 3, 3)
nn.Flatten(), # -> (batch, 27)
nn.Linear(27, 27), # -> (batch, 27)
nn.ReLU(), # -> (batch, 27)
nn.Linear(27, 10), # -> (batch, 10)
nn.Softmax(dim=1), # -> (batch, 1)
)
def forward(self, x):
logits = self.modulesss(x)
logits = self.layers(x)
return logits
model = ConvolutionalNeuralNetwork()
```
%% Cell type:markdown id: tags:
As you can see, our neural network is a so-called *convolutional* neural network (CNN). Such networks are specialised to work on 2D inputs (or higher), like our FashionMNIST data.
Such a CNN consists typically of
- Convolutional layers
- Pooling layers
- Linear layers
The first two layers act on 2D data.
Thereby, an additional dimension is introduced: the *channel* dimension (or *filters*).
For pictures, this could correspond to the color channels red, green, blue.
Since our FashionMNIST data are greyscale pictures, this channel dimension is 1 for the input data.
You should know the linear layer from the previous model-building tutorial.
Since it works on 1D data, `nn.Flatten()` is used before.
Can you find out how convolutional and pooling layers act on our 2D data? Do some research!
<details>
<summary>Click to reveal hint</summary>
<blockquote>
Have a look at <a href="https://androidkt.com/calculate-output-size-convolutional-pooling-layers-cnn/">this</a> site to understand pooling and convolutional layers!
</blockquote>
</details>
<style>
/* Style for the collapsible section */
details {
border: 1px solid #ccc;
border-radius: 4px;
padding: 8px;
margin: 8px 0;
}
/* Style for the summary (button) */
summary {
cursor: pointer;
outline: none;
}
/* Style for the hint text */
blockquote {
margin: 8px;
}
</style>
%% Cell type:markdown id: tags:
## Hyperparameters
Hyperparameters are adjustable parameters that let you control the model optimization process.
Different hyperparameter values can impact model training and convergence rates
([read more](https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html) about hyperparameter tuning)
We define the following hyperparameters for training:
- **Number of Epochs** - the number times to iterate over the dataset
- **Batch Size** - the number of data samples propagated through the network before the parameters are updated
- **Learning Rate** - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.
%% Cell type:code id: tags:
``` python
learning_rate = 1e-3
batch_size = 64
epochs = 5
```
%% Cell type:markdown id: tags:
## Optimization Loop
Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each
iteration of the optimization loop is called an **epoch**.
%% Cell type:markdown id: tags:
To do so we need an iterative python object, holding our training data, to loop over.
You could handle this on your own, but PyTorch provides a [Dataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) and [DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) class.
The `Dataset` allows you to load standard datasets (like FashionMNIST) or to use your own data (e.g. physics data).
An `Dataset` instance can than be used to initialize a `Dataloader`.
If you initialize a `Dataloader` you can define your batch size and how to sample it from the dataset.
Once you initialized your `Dataloader`, you can simply loop over it, as yoou will see below.
Once you initialized your `Dataloader`, you can simply loop over it, as you will see below.
So let's load our `Datasets` and `Dataloaders` - one for a training and test set each.
For such a training, you usually split your dataset into several parts:
- The training set is used for the training of a model
- The validation set is used for mechanism during the training, e.g. to stop the training when the model does not improve any more on the validation dataset (early stopping), therefore it can have an influence on the training (e.g. number of epochs)
- The test dataset is another dataset which is used to determine the model performance, therefore it should be completely independent from the training.
So let's load our `Datasets` and `Dataloaders` - one for a training and validation set each.
%% Cell type:code id: tags:
``` python
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor()
)
test_data = datasets.FashionMNIST(
val_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)
train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)
val_dataloader = DataLoader(val_data, batch_size=64)
```
%% Cell type:markdown id: tags:
Each epoch consists of two main parts:
- **The Train Loop** - iterate over the training dataset and try to converge to optimal parameters.
- **The Validation/Test Loop** - iterate over the test dataset to check if model performance is improving.
- **The Validation/Validation Loop** - iterate over the validation dataset to check if model performance is improving.
Let's briefly familiarize ourselves with some of the concepts used in the training loop. Jump ahead to
see the `full-impl-label` of the optimization loop.
### Loss Function
When presented with some training data, our untrained network is likely not to give the correct
answer. **Loss function** measures the degree of dissimilarity of obtained result to the target value,
and it is the loss function that we want to minimize during training. To calculate the loss we make a
prediction using the inputs of our given data sample and compare it against the true data label value.
Common loss functions include [nn.MSELoss](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) (Mean Square Error) for regression tasks, and
[nn.NLLLoss](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss) (Negative Log Likelihood) for classification.
[nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss) combines ``nn.LogSoftmax`` and ``nn.NLLLoss``.
We pass our model's output logits to ``nn.CrossEntropyLoss``, which will normalize the logits and compute the prediction error.
%% Cell type:code id: tags:
``` python
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()
```
%% Cell type:markdown id: tags:
### Optimizer
Optimization is the process of adjusting model parameters to reduce model error in each training step. **Optimization algorithms** define how this process is performed (in this example we use Stochastic Gradient Descent).
All optimization logic is encapsulated in the ``optimizer`` object. Here, we use the SGD optimizer; additionally, there are many [different optimizers](https://pytorch.org/docs/stable/optim.html)
available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.
We initialize the optimizer by registering the model's parameters that need to be trained, and passing in the learning rate hyperparameter.
%% Cell type:code id: tags:
``` python
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
```
%% Cell type:markdown id: tags:
Inside the training loop, optimization happens in three steps:
* Call ``optimizer.zero_grad()`` to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.
* Backpropagate the prediction loss with a call to ``loss.backward()``. PyTorch deposits the gradients of the loss w.r.t. each parameter.
* Once we have our gradients, we call ``optimizer.step()`` to adjust the parameters by the gradients collected in the backward pass.
%% Cell type:markdown id: tags:
## Full Implementation
We define ``train_loop`` that loops over our optimization code, and ``test_loop`` that
evaluates the model's performance against our test data.
We define ``train_loop`` that loops over our optimization code, and ``val_loop`` that
evaluates the model's performance against our validation data.
%% Cell type:code id: tags:
``` python
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
# Set the model to training mode - important for batch normalization and dropout layers
# Unnecessary in this situation but added for best practices
model.train()
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(X)
loss = loss_fn(pred, y)
# Backpropagation
loss.backward()
optimizer.step()
optimizer.zero_grad()
if batch % 100 == 0:
loss, current = loss.item(), (batch + 1) * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
def test_loop(dataloader, model, loss_fn):
def val_loop(dataloader, model, loss_fn):
# Set the model to evaluation mode - important for batch normalization and dropout layers
# Unnecessary in this situation but added for best practices
model.eval()
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, correct = 0, 0
val_loss, correct = 0, 0
# Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
# also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
with torch.no_grad():
for X, y in dataloader:
pred = model(X)
test_loss += loss_fn(pred, y).item()
val_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
val_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
print(f"Val Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {val_loss:>8f} \n")
```
%% Cell type:markdown id: tags:
We initialize the loss function and optimizer, and pass it to `train_loop` and `test_loop`.
We initialize the loss function and optimizer, and pass it to `train_loop` and `val_loop`.
%% Cell type:code id: tags:
``` python
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
epochs = 10
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train_loop(train_dataloader, model, loss_fn, optimizer)
test_loop(test_dataloader, model, loss_fn)
val_loop(val_dataloader, model, loss_fn)
print("Done!")
```
%% Cell type:markdown id: tags:
After the training you might want to save your model to disk in order to use it for further testing:
%% Cell type:code id: tags:
``` python
path = 'my_cnn.torch'
# save model
torch.save(model.state_dict(), path) # save model parameters
del model
# load model
model = ConvolutionalNeuralNetwork() # initialize model & parameters
model.load_state_dict(torch.load(path)) # load model parameters
model
```
%% Cell type:markdown id: tags:
Tasks for experts:
- **Train on CPU**: the training above is running on CPU. Define the parameter `device` which is 'cpu' or 'cuda:0' depending if there is a GPU available or not. Then adapt the code above, so that the training runs on `device`.
- **Implement Early Stopping**: Turn the `for` loop above (`for t in range(epochs)`) into a `while` loop and stop the training if `test_loss` did not increase for 2 epochs in a row. To test this you can hinder the training on purpose by using an unreasonable high learning rate, such as `learning_rate=1`.
- **Train on GPU**: the training above is running on CPU. Define the parameter `device` which is 'cpu' or 'cuda:0' depending if there is a GPU available or not. Then adapt the code above, so that the training runs on `device`.
- **Implement Early Stopping**: Turn the `for` loop above (`for t in range(epochs)`) into a `while` loop and stop the training if `val_loss` did not increase for 2 epochs in a row. To val this you can hinder the training on purpose by using an unreasonable high learning rate, such as `learning_rate=1`.
%% Cell type:markdown id: tags:
## What you should have learned from this notebook:
- Explain the basic principle of a convolutional layer: What is a kernel? What is a channel?
- Name three hyperparameters of a typical training.
- Why should you split your data at least in two datasets?
- Which module does PyTorch provide to manage your data?
- What is an epoch?
- Name an example for a loss function!
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment