Skip to content
Snippets Groups Projects
Commit 28bef4de authored by Isabel Haide's avatar Isabel Haide
Browse files

fix error in autograd nb

parent 9c18ddba
Branches
No related tags found
No related merge requests found
......@@ -430,443 +430,6 @@
"version": "3.9.16"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tensors\n",
"\n",
"Tensors are a specialized data structure that are very similar to arrays and matrices.\n",
"In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.\n",
"\n",
"Tensors are similar to [NumPy’s](https://numpy.org/) ndarrays, except that tensors can run on GPUs or other hardware accelerators. In fact, tensors and\n",
"NumPy arrays can often share the same underlying memory, eliminating the need to copy data. Tensors\n",
"are also optimized for automatic differentiation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"import torch\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initializing a Tensor\n",
"\n",
"Tensors can be initialized in various ways. Take a look at the following examples:\n",
"\n",
"**Directly from data**\n",
"\n",
"Tensors can be created directly from data. The data type is automatically inferred.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"data = [[1, 2],[3, 4]]\n",
"x_data = torch.tensor(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**From a NumPy array**\n",
"\n",
"Tensors can be created from NumPy arrays.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"np_array = np.array(data)\n",
"x_np = torch.from_numpy(np_array)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**From another tensor:**\n",
"\n",
"The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"x_ones = torch.ones_like(x_data) # retains the properties of x_data\n",
"x_ones"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create different PyTorch tensors and try to use the Numpy operations and manipulations you learnt in the previous Numpy tutorial. Nearly all operations are applicable analogously."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code goes here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Attributes of a Tensor & its Device\n",
"\n",
"Tensor attributes describe their shape, datatype, and the device on which they are stored.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"tensor = torch.rand(3,4)\n",
"\n",
"print(f\"Shape of tensor: {tensor.shape}\")\n",
"print(f\"Datatype of tensor: {tensor.dtype}\")\n",
"print(f\"Device tensor is stored on: {tensor.device}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, the tensor is currently stored on `cpu`. If you want to switch the device of your tensor zo e.g. a GPU, you can to that with the `.to()` operation. GPU devices are enumerated starting from zero: `cuda:0`, `cuda:1`, ... \n",
"\n",
"In the scope of this Blockseminar you will get access to a GPU, if you do not have access yet, the following lines are there to show you the concept of PyTorch devices and might throw errors.\n",
"\n",
"Create a tensor directly on the GPU:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = torch.randn((2,4), device='cuda:0')\n",
"data.device"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Move it back and forth. `.cpu()` is a shortcut for `.to('cpu)`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = data.to('cpu')\n",
"data.device"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = data.to('cuda:0')\n",
"data.device"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = data.cpu()\n",
"data.device"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Automatic Differentiation with ``torch.autograd``\n",
"\n",
"When training a model, e.g. a neural network, the most frequently used algorithm is\n",
"**back propagation**. In this algorithm, parameters (model weights) are\n",
"adjusted according to the **gradient** of the loss function with respect\n",
"to the given parameter.\n",
"\n",
"To compute those gradients, PyTorch has a built-in differentiation engine\n",
"called ``torch.autograd``. It supports automatic computation of gradient for any\n",
"computational graph.\n",
"\n",
"Consider a simple linear function, with input ``x``,\n",
"parameters ``w`` and ``b``, and loss function which computes the binary cross entropy (CE) between the a target `y` and the output `z` of our linear function. It can be defined in\n",
"PyTorch in the following manner:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = torch.ones(5) # input tensor\n",
"y = torch.zeros(3) # expected output\n",
"w = torch.randn(5, 3, requires_grad=True)\n",
"b = torch.randn(3, requires_grad=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that we told PyTorch that we want to compute gradients for the parameters `w` and `b` by stating `requires_grad=True`! \n",
"Now we can start operation with our tensors while PyTorch will track each operation which is somehow connected to `w` or `b`.\n",
"This means that PyTorch creates a computational graph and stores to compute gradients later.\n",
"Not let's compose a linear operation and loss computation to create our graph.\n",
"\n",
"This step is in general called `forward pass`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"z = torch.matmul(x, w)+b\n",
"loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This code defines the following **computational graph**:\n",
"\n",
"![yolo](figures/comp-graph.png)\n",
"\n",
"\n",
"In this network, ``w`` and ``b`` are **parameters**, which we need to\n",
"optimize. Thus, we need to be able to compute the gradients of the loss\n",
"function with respect to those variables. In order to do that, we set\n",
"the ``requires_grad`` property of those tensors.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Computing Gradients\n",
"\n",
"To optimize weights of parameters in the neural network, we need to\n",
"compute the derivatives of our loss function with respect to parameters,\n",
"namely, we need $\\frac{\\partial loss}{\\partial w}$ and\n",
"$\\frac{\\partial loss}{\\partial b}$ under some fixed values of\n",
"``x`` and ``y``. To compute those derivatives, we call\n",
"``loss.backward()``, and then retrieve the values from ``w.grad`` and\n",
"``b.grad``:\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"loss.backward()\n",
"print(w.grad)\n",
"print(b.grad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Congratulations! You just computed gradients using PyTorch's autograd method. Later, we will use these gradients to update our parameters (here `w` and `b`). \n",
"\n",
"Note that this step is a central point to train your model. Especially for complex and big models holds:\n",
"- Computing gradients is very costly, therefore one should use GPUs which are way faster than CPUs\n",
"- Storing computational graphs needs a lot of RAM (ideally on your GPU), therefore you should have an eye on efficient gradient tracking"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Disabling Gradient Tracking\n",
"\n",
"By default, all tensors with ``requires_grad=True`` are tracking their\n",
"computational history and support gradient computation. However, there\n",
"are some cases when we do not need to do that, for example, when we have\n",
"trained the model and just want to apply it to some input data, i.e. we\n",
"only want to do *forward* computations through the network. We can stop\n",
"tracking computations by surrounding our computation code with\n",
"``torch.no_grad()`` block:\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"z = torch.matmul(x, w)+b\n",
"print(z.requires_grad)\n",
"\n",
"with torch.no_grad():\n",
" z = torch.matmul(x, w)+b\n",
"print(z.requires_grad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another way to achieve the same result is to use the ``detach()`` method\n",
"on the tensor:\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"z = torch.matmul(x, w)+b\n",
"z_det = z.detach()\n",
"print(z_det.requires_grad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are reasons you might want to disable gradient tracking:\n",
" - To mark some parameters in your neural network as **frozen parameters**.\n",
" - To **speed up computations** when you are only doing forward pass, because computations on tensors that do\n",
" not track gradients would be more efficient.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"--------------\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What you should have learned from this notebook:\n",
"\n",
"- What is a PyTorch Tensor & what is thair advantage compared to plain Numpy arrays?\n",
"- Why should you make Tensoroperations on GPUs\n",
"- What is PyTorch's autograd mechanism and how does it work?\n",
"- How to compute gradients and how to disable gradient tracking\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.17"
}
},
"nbformat": 4,
"nbformat_minor": 4
"nbformat": 4,
"nbformat_minor": 4
}
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment