minor improvements

6eea8dea · Lars Sowa · a7f532ac · 6eea8dea
Commit 6eea8dea authored 1 year ago by Lars Sowa
--- a/pre-exercises/5_3_pytorch_model_optimization.ipynb
+++ b/pre-exercises/5_3_pytorch_model_optimization.ipynb
@@ -6,7 +6,7 @@
      "source": [
        "# PyTorch: Optimizing Model Parameters\n",
        "\n",
-        "Now that we have a model and data, it's time to train, validate and test our model by optimizing its parameters on\n",
+        "Now that we have a model and data, it's time to train, val our model by optimizing its parameters on\n",
        "our data. \n",
        "Training a model is an iterative process; in each iteration the model makes a guess about the output, calculates\n",
        "the error in its guess (*loss*), collects the derivatives of the error with respect to its parameters (as we saw in\n",
@@ -33,7 +33,7 @@
        "    def __init__(self):\n",
        "        super().__init__()\n",
        "        \n",
-        "        self.modulesss = nn.Sequential(\n",
+        "        self.layers = nn.Sequential(\n",
        "            # sizes correspond to: (batch, channels, height, width)\n",
        "            # input (batch, 1, 28, 28)\n",
        "            nn.Conv2d(in_channels=1, out_channels=3, kernel_size=3),    # -> (batch, 3, 26, 26)\n",
@@ -48,7 +48,7 @@
        "        )\n",
        "\n",
        "    def forward(self, x):\n",
-        "        logits = self.modulesss(x)\n",
+        "        logits = self.layers(x)\n",
        "        return logits\n",
        "\n",
        "model = ConvolutionalNeuralNetwork()"
@@ -157,9 +157,14 @@
        "The `Dataset` allows you to load standard datasets (like FashionMNIST) or to use your own data (e.g. physics data). \n",
        "An `Dataset` instance can than be used to initialize a `Dataloader`.\n",
        "If you initialize a `Dataloader` you can define your batch size and how to sample it from the dataset.\n",
-        "Once you initialized your `Dataloader`, you can simply loop over it, as yoou will see below.\n",
+        "Once you initialized your `Dataloader`, you can simply loop over it, as you will see below.\n",
        "\n",
-        "So let's load our `Datasets` and `Dataloaders` - one for a training and test set each."
+        "For such a training, you usually split your dataset into several parts:\n",
+        "- The training set is used for the training of a model\n",
+        "- The validation set is used for mechanism during the training, e.g. to stop the training when the model does not improve any more on the validation dataset (early stopping), therefore it can have an influence on the training (e.g. number of epochs)\n",
+        "- The test dataset is another dataset which is used to determine the model performance, therefore it should be completely independent from the training.\n",
+        "\n",
+        "So let's load our `Datasets` and `Dataloaders` - one for a training and validation set each."
      ]
    },
    {
@@ -175,7 +180,7 @@
        "    transform=ToTensor()\n",
        ")\n",
        "\n",
-        "test_data = datasets.FashionMNIST(\n",
+        "val_data = datasets.FashionMNIST(\n",
        "    root=\"data\",\n",
        "    train=False,\n",
        "    download=True,\n",
@@ -183,7 +188,7 @@
        ")\n",
        "\n",
        "train_dataloader = DataLoader(training_data, batch_size=64)\n",
-        "test_dataloader = DataLoader(test_data, batch_size=64)"
+        "val_dataloader = DataLoader(val_data, batch_size=64)"
      ]
    },
    {
@@ -192,7 +197,7 @@
      "source": [
        "Each epoch consists of two main parts:\n",
        " - **The Train Loop** - iterate over the training dataset and try to converge to optimal parameters.\n",
-        " - **The Validation/Test Loop** - iterate over the test dataset to check if model performance is improving.\n",
+        " - **The Validation/Validation Loop** - iterate over the validation dataset to check if model performance is improving.\n",
        "\n",
        "Let's briefly familiarize ourselves with some of the concepts used in the training loop. Jump ahead to\n",
        "see the `full-impl-label` of the optimization loop.\n",
@@ -266,8 +271,8 @@
      "source": [
        "\n",
        "## Full Implementation\n",
-        "We define ``train_loop`` that loops over our optimization code, and ``test_loop`` that\n",
-        "evaluates the model's performance against our test data.\n",
+        "We define ``train_loop`` that loops over our optimization code, and ``val_loop`` that\n",
+        "evaluates the model's performance against our validation data.\n",
        "\n"
      ]
    },
@@ -299,32 +304,32 @@
        "            print(f\"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]\")\n",
        "\n",
        "\n",
-        "def test_loop(dataloader, model, loss_fn):\n",
+        "def val_loop(dataloader, model, loss_fn):\n",
        "    # Set the model to evaluation mode - important for batch normalization and dropout layers\n",
        "    # Unnecessary in this situation but added for best practices\n",
        "    model.eval()\n",
        "    size = len(dataloader.dataset)\n",
        "    num_batches = len(dataloader)\n",
-        "    test_loss, correct = 0, 0\n",
+        "    val_loss, correct = 0, 0\n",
        "\n",
        "    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode\n",
        "    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True\n",
        "    with torch.no_grad():\n",
        "        for X, y in dataloader:\n",
        "            pred = model(X)\n",
-        "            test_loss += loss_fn(pred, y).item()\n",
+        "            val_loss += loss_fn(pred, y).item()\n",
        "            correct += (pred.argmax(1) == y).type(torch.float).sum().item()\n",
        "\n",
-        "    test_loss /= num_batches\n",
+        "    val_loss /= num_batches\n",
        "    correct /= size\n",
-        "    print(f\"Test Error: \\n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \\n\")"
+        "    print(f\"Val Error: \\n Accuracy: {(100*correct):>0.1f}%, Avg loss: {val_loss:>8f} \\n\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "We initialize the loss function and optimizer, and pass it to `train_loop` and `test_loop`.\n"
+        "We initialize the loss function and optimizer, and pass it to `train_loop` and `val_loop`.\n"
      ]
    },
    {
@@ -342,17 +347,41 @@
        "for t in range(epochs):\n",
        "    print(f\"Epoch {t+1}\\n-------------------------------\")\n",
        "    train_loop(train_dataloader, model, loss_fn, optimizer)\n",
-        "    test_loop(test_dataloader, model, loss_fn)\n",
+        "    val_loop(val_dataloader, model, loss_fn)\n",
        "print(\"Done!\")"
      ]
    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "After the training you might want to save your model to disk in order to use it for further testing:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "path = 'my_cnn.torch'\n",
+        "# save model\n",
+        "torch.save(model.state_dict(), path)     # save model parameters\n",
+        "del model\n",
+        "\n",
+        "# load model\n",
+        "model = ConvolutionalNeuralNetwork()     # initialize model & parameters\n",
+        "model.load_state_dict(torch.load(path))  # load model parameters\n",
+        "model"
+      ]
+    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Tasks for experts:\n",
-        "- **Train on CPU**: the training above is running on CPU. Define the parameter `device` which is 'cpu' or 'cuda:0' depending if there is a GPU available or not. Then adapt the code above, so that the training runs on `device`. \n",
-        "- **Implement Early Stopping**: Turn the `for` loop above (`for t in range(epochs)`) into a `while` loop and stop the training if `test_loss` did not increase for 2 epochs in a row. To test this you can hinder the training on purpose by using an unreasonable high learning rate, such as `learning_rate=1`."
+        "- **Train on GPU**: the training above is running on CPU. Define the parameter `device` which is 'cpu' or 'cuda:0' depending if there is a GPU available or not. Then adapt the code above, so that the training runs on `device`. \n",
+        "- **Implement Early Stopping**: Turn the `for` loop above (`for t in range(epochs)`) into a `while` loop and stop the training if `val_loss` did not increase for 2 epochs in a row. To val this you can hinder the training on purpose by using an unreasonable high learning rate, such as `learning_rate=1`."
      ]
    },
    {

 %% Cell type:markdown id: tags:

 # PyTorch: Optimizing Model Parameters

-Now that we have a model and data, it's time to train, validate and test our model by optimizing its parameters on
+Now that we have a model and data, it's time to train, val our model by optimizing its parameters on
 our data.
 Training a model is an iterative process; in each iteration the model makes a guess about the output, calculates
 the error in its guess (*loss*), collects the derivatives of the error with respect to its parameters (as we saw in
 the previous tutorial), and **optimizes** these parameters using gradient descent.

 Let's start by defining our model.

 %% Cell type:code id: tags:

 ``` python
 import torch
 from torch import nn
 from torch.utils.data import DataLoader
 from torchvision import datasets
 from torchvision.transforms import ToTensor

 class ConvolutionalNeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()

-        self.modulesss = nn.Sequential(
+        self.layers = nn.Sequential(
            # sizes correspond to: (batch, channels, height, width)
            # input (batch, 1, 28, 28)
            nn.Conv2d(in_channels=1, out_channels=3, kernel_size=3),    # -> (batch, 3, 26, 26)
            nn.MaxPool2d(3, stride=2),                                  # -> (batch, 3, 12, 12)
            nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3),    # -> (batch, 3, 10, 10)
            nn.MaxPool2d(3, stride=3),                                  # -> (batch, 3, 3, 3)
            nn.Flatten(),                                               # -> (batch, 27)
            nn.Linear(27, 27),                                          # -> (batch, 27)
            nn.ReLU(),                                                  # -> (batch, 27)
            nn.Linear(27, 10),                                          # -> (batch, 10)
            nn.Softmax(dim=1),                                               # -> (batch, 1)
        )

    def forward(self, x):
-        logits = self.modulesss(x)
+        logits = self.layers(x)
        return logits

 model = ConvolutionalNeuralNetwork()
 ```

 %% Cell type:markdown id: tags:

 As you can see, our neural network is a so-called *convolutional* neural network (CNN). Such networks are specialised to work on 2D inputs (or higher), like our FashionMNIST data.

 Such a CNN consists typically of
 - Convolutional layers
 - Pooling layers
 - Linear layers

 The first two layers act on 2D data.
 Thereby, an additional dimension is introduced: the *channel* dimension (or *filters*).
 For pictures, this could correspond to the color channels red, green, blue.
 Since our FashionMNIST data are greyscale pictures, this channel dimension is 1 for the input data.


 You should know the linear layer from the previous model-building tutorial.
 Since it works on 1D data, `nn.Flatten()` is used before.

 Can you find out how convolutional and pooling layers act on our 2D data? Do some research!

 <details>
 <summary>Click to reveal hint</summary>

 <blockquote>
 Have a look at <a href="https://androidkt.com/calculate-output-size-convolutional-pooling-layers-cnn/">this</a> site to understand pooling and convolutional layers!
 </blockquote>

 </details>

 <style>
 /* Style for the collapsible section */
 details {
  border: 1px solid #ccc;
  border-radius: 4px;
  padding: 8px;
  margin: 8px 0;
 }

 /* Style for the summary (button) */
 summary {
  cursor: pointer;
  outline: none;
 }

 /* Style for the hint text */
 blockquote {
  margin: 8px;
 }
 </style>

 %% Cell type:markdown id: tags:

 ## Hyperparameters

 Hyperparameters are adjustable parameters that let you control the model optimization process.
 Different hyperparameter values can impact model training and convergence rates
 ([read more](https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html) about hyperparameter tuning)

 We define the following hyperparameters for training:
 - **Number of Epochs** - the number times to iterate over the dataset
 - **Batch Size** - the number of data samples propagated through the network before the parameters are updated
 - **Learning Rate** - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.



 %% Cell type:code id: tags:

 ``` python
 learning_rate = 1e-3
 batch_size = 64
 epochs = 5
 ```

 %% Cell type:markdown id: tags:

 ## Optimization Loop

 Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each
 iteration of the optimization loop is called an **epoch**.

 %% Cell type:markdown id: tags:

 To do so we need an iterative python object, holding our training data, to loop over.
 You could handle this on your own, but PyTorch provides a [Dataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) and [DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) class.
 The `Dataset` allows you to load standard datasets (like FashionMNIST) or to use your own data (e.g. physics data).
 An `Dataset` instance can than be used to initialize a `Dataloader`.
 If you initialize a `Dataloader` you can define your batch size and how to sample it from the dataset.
-Once you initialized your `Dataloader`, you can simply loop over it, as yoou will see below.
+Once you initialized your `Dataloader`, you can simply loop over it, as you will see below.

-So let's load our `Datasets` and `Dataloaders` - one for a training and test set each.
+For such a training, you usually split your dataset into several parts:
+- The training set is used for the training of a model
+- The validation set is used for mechanism during the training, e.g. to stop the training when the model does not improve any more on the validation dataset (early stopping), therefore it can have an influence on the training (e.g. number of epochs)
+- The test dataset is another dataset which is used to determine the model performance, therefore it should be completely independent from the training.
+
+So let's load our `Datasets` and `Dataloaders` - one for a training and validation set each.

 %% Cell type:code id: tags:

 ``` python
 training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
 )

-test_data = datasets.FashionMNIST(
+val_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
 )

 train_dataloader = DataLoader(training_data, batch_size=64)
-test_dataloader = DataLoader(test_data, batch_size=64)
+val_dataloader = DataLoader(val_data, batch_size=64)
 ```

 %% Cell type:markdown id: tags:

 Each epoch consists of two main parts:
 - **The Train Loop** - iterate over the training dataset and try to converge to optimal parameters.
- - **The Validation/Test Loop** - iterate over the test dataset to check if model performance is improving.
+ - **The Validation/Validation Loop** - iterate over the validation dataset to check if model performance is improving.

 Let's briefly familiarize ourselves with some of the concepts used in the training loop. Jump ahead to
 see the `full-impl-label` of the optimization loop.

 ### Loss Function

 When presented with some training data, our untrained network is likely not to give the correct
 answer. **Loss function** measures the degree of dissimilarity of obtained result to the target value,
 and it is the loss function that we want to minimize during training. To calculate the loss we make a
 prediction using the inputs of our given data sample and compare it against the true data label value.

 Common loss functions include [nn.MSELoss](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) (Mean Square Error) for regression tasks, and
 [nn.NLLLoss](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss) (Negative Log Likelihood) for classification.
 [nn.CrossEntropyLoss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss) combines ``nn.LogSoftmax`` and ``nn.NLLLoss``.

 We pass our model's output logits to ``nn.CrossEntropyLoss``, which will normalize the logits and compute the prediction error.


 %% Cell type:code id: tags:

 ``` python
 # Initialize the loss function
 loss_fn = nn.CrossEntropyLoss()
 ```

 %% Cell type:markdown id: tags:

 ### Optimizer

 Optimization is the process of adjusting model parameters to reduce model error in each training step. **Optimization algorithms** define how this process is performed (in this example we use Stochastic Gradient Descent).
 All optimization logic is encapsulated in  the ``optimizer`` object. Here, we use the SGD optimizer; additionally, there are many [different optimizers](https://pytorch.org/docs/stable/optim.html)
 available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.

 We initialize the optimizer by registering the model's parameters that need to be trained, and passing in the learning rate hyperparameter.


 %% Cell type:code id: tags:

 ``` python
 optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
 ```

 %% Cell type:markdown id: tags:

 Inside the training loop, optimization happens in three steps:
 * Call ``optimizer.zero_grad()`` to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.
 * Backpropagate the prediction loss with a call to ``loss.backward()``. PyTorch deposits the gradients of the loss w.r.t. each parameter.
 * Once we have our gradients, we call ``optimizer.step()`` to adjust the parameters by the gradients collected in the backward pass.


 %% Cell type:markdown id: tags:


 ## Full Implementation
-We define ``train_loop`` that loops over our optimization code, and ``test_loop`` that
-evaluates the model's performance against our test data.
+We define ``train_loop`` that loops over our optimization code, and ``val_loop`` that
+evaluates the model's performance against our validation data.


 %% Cell type:code id: tags:

 ``` python
 def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


-def test_loop(dataloader, model, loss_fn):
+def val_loop(dataloader, model, loss_fn):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
-    test_loss, correct = 0, 0
+    val_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
-            test_loss += loss_fn(pred, y).item()
+            val_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

-    test_loss /= num_batches
+    val_loss /= num_batches
    correct /= size
-    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
+    print(f"Val Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {val_loss:>8f} \n")
 ```

 %% Cell type:markdown id: tags:

-We initialize the loss function and optimizer, and pass it to `train_loop` and `test_loop`.
+We initialize the loss function and optimizer, and pass it to `train_loop` and `val_loop`.

 %% Cell type:code id: tags:

 ``` python
 loss_fn = nn.CrossEntropyLoss()
 optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

 epochs = 10
 for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
-    test_loop(test_dataloader, model, loss_fn)
+    val_loop(val_dataloader, model, loss_fn)
 print("Done!")
 ```

 %% Cell type:markdown id: tags:

+After the training you might want to save your model to disk in order to use it for further testing:
+
+%% Cell type:code id: tags:
+
+``` python
+path = 'my_cnn.torch'
+# save model
+torch.save(model.state_dict(), path)     # save model parameters
+del model
+
+# load model
+model = ConvolutionalNeuralNetwork()     # initialize model & parameters
+model.load_state_dict(torch.load(path))  # load model parameters
+model
+```
+
+%% Cell type:markdown id: tags:
+
 Tasks for experts:
- **Train on CPU**: the training above is running on CPU. Define the parameter `device` which is 'cpu' or 'cuda:0' depending if there is a GPU available or not. Then adapt the code above, so that the training runs on `device`.
- **Implement Early Stopping**: Turn the `for` loop above (`for t in range(epochs)`) into a `while` loop and stop the training if `test_loss` did not increase for 2 epochs in a row. To test this you can hinder the training on purpose by using an unreasonable high learning rate, such as `learning_rate=1`.
+- **Train on GPU**: the training above is running on CPU. Define the parameter `device` which is 'cpu' or 'cuda:0' depending if there is a GPU available or not. Then adapt the code above, so that the training runs on `device`.
+- **Implement Early Stopping**: Turn the `for` loop above (`for t in range(epochs)`) into a `while` loop and stop the training if `val_loss` did not increase for 2 epochs in a row. To val this you can hinder the training on purpose by using an unreasonable high learning rate, such as `learning_rate=1`.

 %% Cell type:markdown id: tags:

 ## What you should have learned from this notebook:

 - Explain the basic principle of a convolutional layer: What is a kernel? What is a channel?
 - Name three hyperparameters of a typical training.
 - Why should you split your data at least in two datasets?
 - Which module does PyTorch provide to manage your data?
 - What is an epoch?
 - Name an example for a loss function!