1

When I was learning convolutional neural networks, I saw statements that I had not seen before, so I learned conv2d. zero_ Grad(), I am curious about what object can call this statement, so I changed the two-dimensional convolutional layer to a linear layer on the basis of normal operation, but it always reports an error: the error message is Trying to back through the graph a second time

import torch
from torch import nn


def corr2d(X, K):
    h, w = K.shape
    Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i:i + h, j:j + w] * K).sum()
    return Y


X = torch.ones((6, 8))
X[:, 2:6] = 0
K = torch.tensor([[1.0, -1.0]])
Y = corr2d(X, K)
conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(1, 2), bias=False)
X = X.reshape((1, 1, 6, 8))
Y = Y.reshape((1, 1, 6, 7))
lr = 3e-2

for i in range(10):
    Y_hat = conv2d(X)
    print(Y_hat)
    l = (Y_hat - Y) ** 2
    conv2d.zero_grad()
    l.sum().backward()
    conv2d.weight.data[:] -= lr * conv2d.weight.grad
    if (i + 1) % 2 == 0:
        print(f"epoch{i + 1},loss{l.sum():.3f}")

# Can operate normally
import torch
from torch import nn
X = torch.rand(size = (2,5),requires_grad = True)
true_w = torch.tensor([1,2,3,4,5.]).reshape((X.shape[1],-1))
true_b = torch.zeros(X.shape[0]).reshape((-1,1))
Y = torch.matmul(X,true_w) + true_b
Y += torch.rand(size=(Y.shape))
linear1 = nn.Linear(5,1)
lr = 0.01
for i in range(10):
    Y_hat = linear1(X)
    l = (Y_hat-Y)**2
    print(l)
    linear1.zero_grad()
    l.sum().backward()
    linear1.weight.data[:] -= lr*linear1.weight.grad
    if(i + 1) % 2 == 0:
        print(f"epoch{i+1},loss{l.sum():.3f}")
# Trying to backward through the graph a second time
smac89
  • 39,374
  • 15
  • 132
  • 179
fightboy
  • 11
  • 1

2 Answers2

0

When you do a forward pass the PyTorch keep various states around to make backpropagation possible. After backpropagation some information is not needed anymore and the memory is freed, this included the the computational graph that was executed during the forward pass.

You can use tensor.backward(retain_graph=True) do suppress this behaviour and keep the computational graph around.

retain_graph (bool, optional) – If False [default], the graph used to compute the grads will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Defaults to the value of create_graph.


Your problem is you use X with requires_grad=True and then calculate Y with it you could solve it like this:

X = torch.rand(size = (2,5),requires_grad = True) # <-- set this to False
with torch.no_grad():
    true_w = torch.tensor([1,2,3,4,5.]).reshape((X.shape[1],-1))
    true_b = torch.zeros(X.shape[0]).reshape((-1,1))
    Y = torch.matmul(X,true_w) + true_b
    Y += torch.rand(size=(Y.shape))

But honestly you normally do not require grads of the inputs. Exclusion are debuging or adverserial training.

Daraan
  • 1,797
  • 13
  • 24
0

While your training loop is identical but the tensors you are using in both cases are not. If you notice, for the linear layer, your input tensor is:

X = torch.rand(size = (2,5),requires_grad = True)

This X is the reason why you are getting the mentioned error from the backward() function. When you call backward(), to reduce memory usage, the intermediary values gets detached from the computation graph immediately after the back-propagation. Hence when you call backward() in the second iteration, you encounter an error since the previous state of that tensor doesn't exist anymore. To bypass this error, you can either do:

l.sum().backward(retain_graph = True)

or modify your input tensor X by:

X = torch.rand(size = (2,5)) # removed required_grad = True

You can read these threads [1], [2], [3], to get more insights.

Ro.oT
  • 623
  • 6
  • 15