"This Tutorial will give you an overview about PyTorch tensors and PyTorch's build in autograd function to perform backpropagation, which is the key mechanism for machine learning.\n",
"\n",
"This tutorial is adapted from https://pytorch.org/tutorials/beginner/basics/intro.html"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tensors\n",
"\n",
"Tensors are a specialized data structure that are very similar to arrays and matrices.\n",
"In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.\n",
"\n",
"Tensors are similar to [NumPy’s](https://numpy.org/) ndarrays, except that tensors can run on GPUs or other hardware accelerators. In fact, tensors and\n",
"NumPy arrays can often share the same underlying memory, eliminating the need to copy data. Tensors\n",
"are also optimized for automatic differentiation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import torch\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initializing a Tensor\n",
"\n",
"Tensors can be initialized in various ways. Take a look at the following examples:\n",
"\n",
"**Directly from data**\n",
"\n",
"Tensors can be created directly from data. The data type is automatically inferred.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"data = [[1, 2],[3, 4]]\n",
"x_data = torch.tensor(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**From a NumPy array**\n",
"\n",
"Tensors can be created from NumPy arrays.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"np_array = np.array(data)\n",
"x_np = torch.from_numpy(np_array)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**From another tensor:**\n",
"\n",
"The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"x_ones = torch.ones_like(x_data) # retains the properties of x_data\n",
"x_ones"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create different PyTorch tensors and try to use the Numpy operations and manipulations you learnt in the previous Numpy tutorial. Nearly all operations are applicable analogously."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code goes here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Attributes of a Tensor & its Device\n",
"\n",
"Tensor attributes describe their shape, datatype, and the device on which they are stored.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"tensor = torch.rand(3,4)\n",
"\n",
"print(f\"Shape of tensor: {tensor.shape}\")\n",
"print(f\"Datatype of tensor: {tensor.dtype}\")\n",
"print(f\"Device tensor is stored on: {tensor.device}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, the tensor is currently stored on `cpu`. If you want to switch the device of your tensor zo e.g. a GPU, you can to that with the `.to()` operation. GPU devices are enumerated starting from zero: `cuda:0`, `cuda:1`, ... \n",
"\n",
"In the scope of this Blockseminar you will get access to a GPU, if you do not have access yet, the following lines are there to show you the concept of PyTorch devices and might throw errors.\n",
"\n",
"Create a tensor directly on the GPU:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = torch.randn((2,4), device='cuda:0')\n",
"data.device"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Move it back and forth. `.cpu()` is a shortcut for `.to('cpu)`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = data.to('cpu')\n",
"data.device"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = data.to('cuda:0')\n",
"data.device"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = data.cpu()\n",
"data.device"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Automatic Differentiation with ``torch.autograd``\n",
"\n",
"When training a model, e.g. a neural network, the most frequently used algorithm is\n",
"**back propagation**. In this algorithm, parameters (model weights) are\n",
"adjusted according to the **gradient** of the loss function with respect\n",
"to the given parameter.\n",
"\n",
"To compute those gradients, PyTorch has a built-in differentiation engine\n",
"called ``torch.autograd``. It supports automatic computation of gradient for any\n",
"computational graph.\n",
"\n",
"Consider a simple linear function, with input ``x``,\n",
"parameters ``w`` and ``b``, and loss function which computes the mean squared error between the a target `y` and the output `z` of our linear function. It can be defined in\n",
"PyTorch in the following manner:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = torch.ones(5) # input tensor\n",
"y = torch.zeros(3) # expected output\n",
"w = torch.randn(5, 3, requires_grad=True)\n",
"b = torch.randn(3, requires_grad=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that we told PyTorch that we want to compute gradients for the parameters `w` and `b` by stating `requires_grad=True`! \n",
"Now we can start operation with our tensors while PyTorch will track each operation which is somehow connected to `w` or `b`.\n",
"This means that PyTorch creates a computational graph and stores to compute gradients later.\n",
"Not let's compose a linear operation and loss computation to create our graph.\n",
"\n",
"This step is in general called `forward pass`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"z = torch.matmul(x, w)+b\n",
"loss = torch.nn.functional.mse_loss(z, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This code defines the following **computational graph**:\n",
"\n",
"\n",
"\n",
"\n",
"In this network, ``w`` and ``b`` are **parameters**, which we need to\n",
"optimize. Thus, we need to be able to compute the gradients of the loss\n",
"function with respect to those variables. In order to do that, we set\n",
"the ``requires_grad`` property of those tensors.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Computing Gradients\n",
"\n",
"To optimize weights of parameters in the neural network, we need to\n",
"compute the derivatives of our loss function with respect to parameters,\n",
"namely, we need $\\frac{\\partial loss}{\\partial w}$ and\n",
"$\\frac{\\partial loss}{\\partial b}$ under some fixed values of\n",
"``x`` and ``y``. To compute those derivatives, we call\n",
"``loss.backward()``, and then retrieve the values from ``w.grad`` and\n",
"``b.grad``:\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loss.backward()\n",
"print(w.grad)\n",
"print(b.grad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Congratulations! You just computed gradients using PyTorch's autograd method. Later, we will use these gradients to update our parameters (here `w` and `b`). \n",
"\n",
"Note that this step is a central point to train your model. Especially for complex and big models holds:\n",
"- Computing gradients is very costly, therefore one should use GPUs which are way faster than CPUs\n",
"- Storing computational graphs needs a lot of RAM (ideally on your GPU), therefore you should have an eye on efficient gradient tracking"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Disabling Gradient Tracking\n",
"\n",
"By default, all tensors with ``requires_grad=True`` are tracking their\n",
"computational history and support gradient computation. However, there\n",
"are some cases when we do not need to do that, for example, when we have\n",
"trained the model and just want to apply it to some input data, i.e. we\n",
"only want to do *forward* computations through the network. We can stop\n",
"tracking computations by surrounding our computation code with\n",
"``torch.no_grad()`` block:\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"z = torch.matmul(x, w)+b\n",
"print(z.requires_grad)\n",
"\n",
"with torch.no_grad():\n",
" z = torch.matmul(x, w)+b\n",
"print(z.requires_grad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another way to achieve the same result is to use the ``detach()`` method\n",
"on the tensor:\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"z = torch.matmul(x, w)+b\n",
"z_det = z.detach()\n",
"print(z_det.requires_grad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are reasons you might want to disable gradient tracking:\n",
" - To mark some parameters in your neural network as **frozen parameters**.\n",
" - To **speed up computations** when you are only doing forward pass, because computations on tensors that do\n",
" not track gradients would be more efficient.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"--------------\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What you should have learned from this notebook:\n",
"\n",
"- What is a PyTorch Tensor & what is thair advantage compared to plain Numpy arrays?\n",
"- Why should you make Tensoroperations on GPUs\n",
"- What is PyTorch's autograd mechanism and how does it work?\n",
"- How to compute gradients and how to disable gradient tracking\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
%% Cell type:markdown id: tags:
# PyTorch Tutorial: Tensors and Autograd
This Tutorial will give you an overview about PyTorch tensors and PyTorch's build in autograd function to perform backpropagation, which is the key mechanism for machine learning.
This tutorial is adapted from https://pytorch.org/tutorials/beginner/basics/intro.html
%% Cell type:code id: tags:
``` python
%matplotlibinline
```
%% Cell type:markdown id: tags:
## Tensors
Tensors are a specialized data structure that are very similar to arrays and matrices.
In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.
Tensors are similar to [NumPy’s](https://numpy.org/) ndarrays, except that tensors can run on GPUs or other hardware accelerators. In fact, tensors and
NumPy arrays can often share the same underlying memory, eliminating the need to copy data. Tensors
are also optimized for automatic differentiation.
%% Cell type:code id: tags:
``` python
importtorch
importnumpyasnp
```
%% Cell type:markdown id: tags:
### Initializing a Tensor
Tensors can be initialized in various ways. Take a look at the following examples:
**Directly from data**
Tensors can be created directly from data. The data type is automatically inferred.
%% Cell type:code id: tags:
``` python
data=[[1,2],[3,4]]
x_data=torch.tensor(data)
```
%% Cell type:markdown id: tags:
**From a NumPy array**
Tensors can be created from NumPy arrays.
%% Cell type:code id: tags:
``` python
np_array=np.array(data)
x_np=torch.from_numpy(np_array)
```
%% Cell type:markdown id: tags:
**From another tensor:**
The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.
%% Cell type:code id: tags:
``` python
x_ones=torch.ones_like(x_data)# retains the properties of x_data
x_ones
```
%% Cell type:markdown id: tags:
Create different PyTorch tensors and try to use the Numpy operations and manipulations you learnt in the previous Numpy tutorial. Nearly all operations are applicable analogously.
%% Cell type:code id: tags:
``` python
# your code goes here
```
%% Cell type:markdown id: tags:
### Attributes of a Tensor & its Device
Tensor attributes describe their shape, datatype, and the device on which they are stored.
%% Cell type:code id: tags:
``` python
tensor=torch.rand(3,4)
print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")
```
%% Cell type:markdown id: tags:
As you can see, the tensor is currently stored on `cpu`. If you want to switch the device of your tensor zo e.g. a GPU, you can to that with the `.to()` operation. GPU devices are enumerated starting from zero: `cuda:0`, `cuda:1`, ...
In the scope of this Blockseminar you will get access to a GPU, if you do not have access yet, the following lines are there to show you the concept of PyTorch devices and might throw errors.
Create a tensor directly on the GPU:
%% Cell type:code id: tags:
``` python
data=torch.randn((2,4),device='cuda:0')
data.device
```
%% Cell type:markdown id: tags:
Move it back and forth. `.cpu()` is a shortcut for `.to('cpu)`
%% Cell type:code id: tags:
``` python
data=data.to('cpu')
data.device
```
%% Cell type:code id: tags:
``` python
data=data.to('cuda:0')
data.device
```
%% Cell type:code id: tags:
``` python
data=data.cpu()
data.device
```
%% Cell type:markdown id: tags:
### Automatic Differentiation with ``torch.autograd``
When training a model, e.g. a neural network, the most frequently used algorithm is
**back propagation**. In this algorithm, parameters (model weights) are
adjusted according to the **gradient** of the loss function with respect
to the given parameter.
To compute those gradients, PyTorch has a built-in differentiation engine
called ``torch.autograd``. It supports automatic computation of gradient for any
computational graph.
Consider a simple linear function, with input ``x``,
parameters ``w`` and ``b``, and loss function which computes the mean squared error between the a target `y` and the output `z` of our linear function. It can be defined in
PyTorch in the following manner:
%% Cell type:code id: tags:
``` python
x=torch.ones(5)# input tensor
y=torch.zeros(3)# expected output
w=torch.randn(5,3,requires_grad=True)
b=torch.randn(3,requires_grad=True)
```
%% Cell type:markdown id: tags:
Note that we told PyTorch that we want to compute gradients for the parameters `w` and `b` by stating `requires_grad=True`!
Now we can start operation with our tensors while PyTorch will track each operation which is somehow connected to `w` or `b`.
This means that PyTorch creates a computational graph and stores to compute gradients later.
Not let's compose a linear operation and loss computation to create our graph.
This step is in general called `forward pass`.
%% Cell type:code id: tags:
``` python
z=torch.matmul(x,w)+b
loss=torch.nn.functional.mse_loss(z,y)
```
%% Cell type:markdown id: tags:
This code defines the following **computational graph**:

In this network, ``w`` and ``b`` are **parameters**, which we need to
optimize. Thus, we need to be able to compute the gradients of the loss
function with respect to those variables. In order to do that, we set
the ``requires_grad`` property of those tensors.
%% Cell type:markdown id: tags:
### Computing Gradients
To optimize weights of parameters in the neural network, we need to
compute the derivatives of our loss function with respect to parameters,
namely, we need $\frac{\partial loss}{\partial w}$ and
$\frac{\partial loss}{\partial b}$ under some fixed values of
``x`` and ``y``. To compute those derivatives, we call
``loss.backward()``, and then retrieve the values from ``w.grad`` and
``b.grad``:
%% Cell type:code id: tags:
``` python
loss.backward()
print(w.grad)
print(b.grad)
```
%% Cell type:markdown id: tags:
Congratulations! You just computed gradients using PyTorch's autograd method. Later, we will use these gradients to update our parameters (here `w` and `b`).
Note that this step is a central point to train your model. Especially for complex and big models holds:
- Computing gradients is very costly, therefore one should use GPUs which are way faster than CPUs
- Storing computational graphs needs a lot of RAM (ideally on your GPU), therefore you should have an eye on efficient gradient tracking
%% Cell type:markdown id: tags:
### Disabling Gradient Tracking
By default, all tensors with ``requires_grad=True`` are tracking their
computational history and support gradient computation. However, there
are some cases when we do not need to do that, for example, when we have
trained the model and just want to apply it to some input data, i.e. we
only want to do *forward* computations through the network. We can stop
tracking computations by surrounding our computation code with
``torch.no_grad()`` block:
%% Cell type:code id: tags:
``` python
z=torch.matmul(x,w)+b
print(z.requires_grad)
withtorch.no_grad():
z=torch.matmul(x,w)+b
print(z.requires_grad)
```
%% Cell type:markdown id: tags:
Another way to achieve the same result is to use the ``detach()`` method
on the tensor:
%% Cell type:code id: tags:
``` python
z=torch.matmul(x,w)+b
z_det=z.detach()
print(z_det.requires_grad)
```
%% Cell type:markdown id: tags:
There are reasons you might want to disable gradient tracking:
- To mark some parameters in your neural network as **frozen parameters**.
- To **speed up computations** when you are only doing forward pass, because computations on tensors that do
not track gradients would be more efficient.
%% Cell type:markdown id: tags:
--------------
%% Cell type:markdown id: tags:
## What you should have learned from this notebook:
- What is a PyTorch Tensor & what is thair advantage compared to plain Numpy arrays?
- Why should you make Tensoroperations on GPUs
- What is PyTorch's autograd mechanism and how does it work?
- How to compute gradients and how to disable gradient tracking