PyTorch: Intro to tensors & autograd

752c35a0 · Lars Sowa · c09069d8 · 752c35a0 · 752c35a0
Commit 752c35a0 authored 1 year ago by Lars Sowa
--- a/pre-exercises/PyTorch_TensorsAndAutograd.ipynb
+++ b/pre-exercises/PyTorch_TensorsAndAutograd.ipynb
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# PyTorch Tutorial: Tensors and Autograd\n",
+        "\n",
+        "This Tutorial will give you an overview about PyTorch tensors and PyTorch's build in autograd function to perform backpropagation, which is the key mechanism for machine learning.\n",
+        "\n",
+        "This tutorial is adapted from https://pytorch.org/tutorials/beginner/basics/intro.html"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "%matplotlib inline"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Tensors\n",
+        "\n",
+        "Tensors are a specialized data structure that are very similar to arrays and matrices.\n",
+        "In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.\n",
+        "\n",
+        "Tensors are similar to [NumPy’s](https://numpy.org/) ndarrays, except that tensors can run on GPUs or other hardware accelerators. In fact, tensors and\n",
+        "NumPy arrays can often share the same underlying memory, eliminating the need to copy data. Tensors\n",
+        "are also optimized for automatic differentiation."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import torch\n",
+        "import numpy as np"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Initializing a Tensor\n",
+        "\n",
+        "Tensors can be initialized in various ways. Take a look at the following examples:\n",
+        "\n",
+        "**Directly from data**\n",
+        "\n",
+        "Tensors can be created directly from data. The data type is automatically inferred.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "data = [[1, 2],[3, 4]]\n",
+        "x_data = torch.tensor(data)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "**From a NumPy array**\n",
+        "\n",
+        "Tensors can be created from NumPy arrays.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "np_array = np.array(data)\n",
+        "x_np = torch.from_numpy(np_array)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "**From another tensor:**\n",
+        "\n",
+        "The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "x_ones = torch.ones_like(x_data) # retains the properties of x_data\n",
+        "x_ones"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Create different PyTorch tensors and try to use the Numpy operations and manipulations you learnt in the previous Numpy tutorial. Nearly all operations are applicable analogously."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# your code goes here"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Attributes of a Tensor & its Device\n",
+        "\n",
+        "Tensor attributes describe their shape, datatype, and the device on which they are stored.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "tensor = torch.rand(3,4)\n",
+        "\n",
+        "print(f\"Shape of tensor: {tensor.shape}\")\n",
+        "print(f\"Datatype of tensor: {tensor.dtype}\")\n",
+        "print(f\"Device tensor is stored on: {tensor.device}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "As you can see, the tensor is currently stored on `cpu`. If you want to switch the device of your tensor zo e.g. a GPU, you can to that with the `.to()` operation. GPU devices are enumerated starting from zero: `cuda:0`, `cuda:1`, ... \n",
+        "\n",
+        "In the scope of this Blockseminar you will get access to a GPU, if you do not have access yet, the following lines are there to show you the concept of PyTorch devices and might throw errors.\n",
+        "\n",
+        "Create a tensor directly on the GPU:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "data = torch.randn((2,4), device='cuda:0')\n",
+        "data.device"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Move it back and forth. `.cpu()` is a shortcut for `.to('cpu)`"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "data = data.to('cpu')\n",
+        "data.device"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "data = data.to('cuda:0')\n",
+        "data.device"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "data = data.cpu()\n",
+        "data.device"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Automatic Differentiation with ``torch.autograd``\n",
+        "\n",
+        "When training a model, e.g. a neural network, the most frequently used algorithm is\n",
+        "**back propagation**. In this algorithm, parameters (model weights) are\n",
+        "adjusted according to the **gradient** of the loss function with respect\n",
+        "to the given parameter.\n",
+        "\n",
+        "To compute those gradients, PyTorch has a built-in differentiation engine\n",
+        "called ``torch.autograd``. It supports automatic computation of gradient for any\n",
+        "computational graph.\n",
+        "\n",
+        "Consider a simple linear function, with input ``x``,\n",
+        "parameters ``w`` and ``b``, and loss function which computes the mean squared error between the a target `y` and the output `z` of our linear function. It can be defined in\n",
+        "PyTorch in the following manner:\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "x = torch.ones(5)  # input tensor\n",
+        "y = torch.zeros(3)  # expected output\n",
+        "w = torch.randn(5, 3, requires_grad=True)\n",
+        "b = torch.randn(3, requires_grad=True)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Note that we told PyTorch that we want to compute gradients for the parameters `w` and `b` by stating `requires_grad=True`! \n",
+        "Now we can start operation with our tensors while PyTorch will track each operation which is somehow connected to `w` or `b`.\n",
+        "This means that PyTorch creates a computational graph and stores to compute gradients later.\n",
+        "Not let's compose a linear operation and loss computation to create our graph.\n",
+        "\n",
+        "This step is in general called `forward pass`."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "z = torch.matmul(x, w)+b\n",
+        "loss = torch.nn.functional.mse_loss(z, y)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "This code defines the following **computational graph**:\n",
+        "\n",
+        "![yolo](figures/comp-graph.png)\n",
+        "\n",
+        "\n",
+        "In this network, ``w`` and ``b`` are **parameters**, which we need to\n",
+        "optimize. Thus, we need to be able to compute the gradients of the loss\n",
+        "function with respect to those variables. In order to do that, we set\n",
+        "the ``requires_grad`` property of those tensors.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Computing Gradients\n",
+        "\n",
+        "To optimize weights of parameters in the neural network, we need to\n",
+        "compute the derivatives of our loss function with respect to parameters,\n",
+        "namely, we need $\\frac{\\partial loss}{\\partial w}$ and\n",
+        "$\\frac{\\partial loss}{\\partial b}$ under some fixed values of\n",
+        "``x`` and ``y``. To compute those derivatives, we call\n",
+        "``loss.backward()``, and then retrieve the values from ``w.grad`` and\n",
+        "``b.grad``:\n",
+        "\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "loss.backward()\n",
+        "print(w.grad)\n",
+        "print(b.grad)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Congratulations! You just computed gradients using PyTorch's autograd method. Later, we will use these gradients to update our parameters (here `w` and `b`). \n",
+        "\n",
+        "Note that this step is a central point to train your model. Especially for complex and big models holds:\n",
+        "- Computing gradients is very costly, therefore one should use GPUs which are way faster than CPUs\n",
+        "- Storing computational graphs needs a lot of RAM (ideally on your GPU), therefore you should have an eye on efficient gradient tracking"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Disabling Gradient Tracking\n",
+        "\n",
+        "By default, all tensors with ``requires_grad=True`` are tracking their\n",
+        "computational history and support gradient computation. However, there\n",
+        "are some cases when we do not need to do that, for example, when we have\n",
+        "trained the model and just want to apply it to some input data, i.e. we\n",
+        "only want to do *forward* computations through the network. We can stop\n",
+        "tracking computations by surrounding our computation code with\n",
+        "``torch.no_grad()`` block:\n",
+        "\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "z = torch.matmul(x, w)+b\n",
+        "print(z.requires_grad)\n",
+        "\n",
+        "with torch.no_grad():\n",
+        "    z = torch.matmul(x, w)+b\n",
+        "print(z.requires_grad)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Another way to achieve the same result is to use the ``detach()`` method\n",
+        "on the tensor:\n",
+        "\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "z = torch.matmul(x, w)+b\n",
+        "z_det = z.detach()\n",
+        "print(z_det.requires_grad)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "There are reasons you might want to disable gradient tracking:\n",
+        "  - To mark some parameters in your neural network as **frozen parameters**.\n",
+        "  - To **speed up computations** when you are only doing forward pass, because computations on tensors that do\n",
+        "    not track gradients would be more efficient.\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "--------------\n",
+        "\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## What you should have learned from this notebook:\n",
+        "\n",
+        "- What is a PyTorch Tensor & what is thair advantage compared to plain Numpy arrays?\n",
+        "- Why should you make Tensoroperations on GPUs\n",
+        "- What is PyTorch's autograd mechanism and how does it work?\n",
+        "- How to compute gradients and how to disable gradient tracking\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.9.16"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
+%% Cell type:markdown id: tags:
+
+# PyTorch Tutorial: Tensors and Autograd
+
+This Tutorial will give you an overview about PyTorch tensors and PyTorch's build in autograd function to perform backpropagation, which is the key mechanism for machine learning.
+
+This tutorial is adapted from https://pytorch.org/tutorials/beginner/basics/intro.html
+
+%% Cell type:code id: tags:
+
+``` python
+%matplotlib inline
+```
+
+%% Cell type:markdown id: tags:
+
+## Tensors
+
+Tensors are a specialized data structure that are very similar to arrays and matrices.
+In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.
+
+Tensors are similar to [NumPy’s](https://numpy.org/) ndarrays, except that tensors can run on GPUs or other hardware accelerators. In fact, tensors and
+NumPy arrays can often share the same underlying memory, eliminating the need to copy data. Tensors
+are also optimized for automatic differentiation.
+
+%% Cell type:code id: tags:
+
+``` python
+import torch
+import numpy as np
+```
+
+%% Cell type:markdown id: tags:
+
+### Initializing a Tensor
+
+Tensors can be initialized in various ways. Take a look at the following examples:
+
+**Directly from data**
+
+Tensors can be created directly from data. The data type is automatically inferred.
+
+
+%% Cell type:code id: tags:
+
+``` python
+data = [[1, 2],[3, 4]]
+x_data = torch.tensor(data)
+```
+
+%% Cell type:markdown id: tags:
+
+**From a NumPy array**
+
+Tensors can be created from NumPy arrays.
+
+
+%% Cell type:code id: tags:
+
+``` python
+np_array = np.array(data)
+x_np = torch.from_numpy(np_array)
+```
+
+%% Cell type:markdown id: tags:
+
+**From another tensor:**
+
+The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.
+
+
+%% Cell type:code id: tags:
+
+``` python
+x_ones = torch.ones_like(x_data) # retains the properties of x_data
+x_ones
+```
+
+%% Cell type:markdown id: tags:
+
+Create different PyTorch tensors and try to use the Numpy operations and manipulations you learnt in the previous Numpy tutorial. Nearly all operations are applicable analogously.
+
+%% Cell type:code id: tags:
+
+``` python
+# your code goes here
+```
+
+%% Cell type:markdown id: tags:
+
+### Attributes of a Tensor & its Device
+
+Tensor attributes describe their shape, datatype, and the device on which they are stored.
+
+
+%% Cell type:code id: tags:
+
+``` python
+tensor = torch.rand(3,4)
+
+print(f"Shape of tensor: {tensor.shape}")
+print(f"Datatype of tensor: {tensor.dtype}")
+print(f"Device tensor is stored on: {tensor.device}")
+```
+
+%% Cell type:markdown id: tags:
+
+As you can see, the tensor is currently stored on `cpu`. If you want to switch the device of your tensor zo e.g. a GPU, you can to that with the `.to()` operation. GPU devices are enumerated starting from zero: `cuda:0`, `cuda:1`, ...
+
+In the scope of this Blockseminar you will get access to a GPU, if you do not have access yet, the following lines are there to show you the concept of PyTorch devices and might throw errors.
+
+Create a tensor directly on the GPU:
+
+%% Cell type:code id: tags:
+
+``` python
+data = torch.randn((2,4), device='cuda:0')
+data.device
+```
+
+%% Cell type:markdown id: tags:
+
+Move it back and forth. `.cpu()` is a shortcut for `.to('cpu)`
+
+%% Cell type:code id: tags:
+
+``` python
+data = data.to('cpu')
+data.device
+```
+
+%% Cell type:code id: tags:
+
+``` python
+data = data.to('cuda:0')
+data.device
+```
+
+%% Cell type:code id: tags:
+
+``` python
+data = data.cpu()
+data.device
+```
+
+%% Cell type:markdown id: tags:
+
+### Automatic Differentiation with ``torch.autograd``
+
+When training a model, e.g. a neural network, the most frequently used algorithm is
+**back propagation**. In this algorithm, parameters (model weights) are
+adjusted according to the **gradient** of the loss function with respect
+to the given parameter.
+
+To compute those gradients, PyTorch has a built-in differentiation engine
+called ``torch.autograd``. It supports automatic computation of gradient for any
+computational graph.
+
+Consider a simple linear function, with input ``x``,
+parameters ``w`` and ``b``, and loss function which computes the mean squared error between the a target `y` and the output `z` of our linear function. It can be defined in
+PyTorch in the following manner:
+
+%% Cell type:code id: tags:
+
+``` python
+x = torch.ones(5)  # input tensor
+y = torch.zeros(3)  # expected output
+w = torch.randn(5, 3, requires_grad=True)
+b = torch.randn(3, requires_grad=True)
+```
+
+%% Cell type:markdown id: tags:
+
+Note that we told PyTorch that we want to compute gradients for the parameters `w` and `b` by stating `requires_grad=True`!
+Now we can start operation with our tensors while PyTorch will track each operation which is somehow connected to `w` or `b`.
+This means that PyTorch creates a computational graph and stores to compute gradients later.
+Not let's compose a linear operation and loss computation to create our graph.
+
+This step is in general called `forward pass`.
+
+%% Cell type:code id: tags:
+
+``` python
+z = torch.matmul(x, w)+b
+loss = torch.nn.functional.mse_loss(z, y)
+```
+
+%% Cell type:markdown id: tags:
+
+This code defines the following **computational graph**:
+
+![yolo](figures/comp-graph.png)
+
+
+In this network, ``w`` and ``b`` are **parameters**, which we need to
+optimize. Thus, we need to be able to compute the gradients of the loss
+function with respect to those variables. In order to do that, we set
+the ``requires_grad`` property of those tensors.
+
+
+%% Cell type:markdown id: tags:
+
+### Computing Gradients
+
+To optimize weights of parameters in the neural network, we need to
+compute the derivatives of our loss function with respect to parameters,
+namely, we need $\frac{\partial loss}{\partial w}$ and
+$\frac{\partial loss}{\partial b}$ under some fixed values of
+``x`` and ``y``. To compute those derivatives, we call
+``loss.backward()``, and then retrieve the values from ``w.grad`` and
+``b.grad``:
+
+
+
+%% Cell type:code id: tags:
+
+``` python
+loss.backward()
+print(w.grad)
+print(b.grad)
+```
+
+%% Cell type:markdown id: tags:
+
+Congratulations! You just computed gradients using PyTorch's autograd method. Later, we will use these gradients to update our parameters (here `w` and `b`).
+
+Note that this step is a central point to train your model. Especially for complex and big models holds:
+- Computing gradients is very costly, therefore one should use GPUs which are way faster than CPUs
+- Storing computational graphs needs a lot of RAM (ideally on your GPU), therefore you should have an eye on efficient gradient tracking
+
+%% Cell type:markdown id: tags:
+
+### Disabling Gradient Tracking
+
+By default, all tensors with ``requires_grad=True`` are tracking their
+computational history and support gradient computation. However, there
+are some cases when we do not need to do that, for example, when we have
+trained the model and just want to apply it to some input data, i.e. we
+only want to do *forward* computations through the network. We can stop
+tracking computations by surrounding our computation code with
+``torch.no_grad()`` block:
+
+
+
+%% Cell type:code id: tags:
+
+``` python
+z = torch.matmul(x, w)+b
+print(z.requires_grad)
+
+with torch.no_grad():
+    z = torch.matmul(x, w)+b
+print(z.requires_grad)
+```
+
+%% Cell type:markdown id: tags:
+
+Another way to achieve the same result is to use the ``detach()`` method
+on the tensor:
+
+
+
+%% Cell type:code id: tags:
+
+``` python
+z = torch.matmul(x, w)+b
+z_det = z.detach()
+print(z_det.requires_grad)
+```
+
+%% Cell type:markdown id: tags:
+
+There are reasons you might want to disable gradient tracking:
+  - To mark some parameters in your neural network as **frozen parameters**.
+  - To **speed up computations** when you are only doing forward pass, because computations on tensors that do
+    not track gradients would be more efficient.
+
+
+%% Cell type:markdown id: tags:
+
+--------------
+
+
+
+%% Cell type:markdown id: tags:
+
+## What you should have learned from this notebook:
+
+- What is a PyTorch Tensor & what is thair advantage compared to plain Numpy arrays?
+- Why should you make Tensoroperations on GPUs
+- What is PyTorch's autograd mechanism and how does it work?
+- How to compute gradients and how to disable gradient tracking
--- a/pre-exercises/figures/comp-graph.png
+++ b/pre-exercises/figures/comp-graph.png