5
a = torch.nn.Parameter(torch.ones(5, 5))
a = a.cuda()
print(a.requires_grad)
b = a
b = b - 2
print('a ', a)
print('b ', b)
loss = (b - 1).pow(2).sum()
loss.backward()
print(a.grad)
print(b.grad)

After executing codes, the a.grad is None although a.requires_grad is True. But if the code a = a.cuda() is removed, a.grad is available after the loss backward.

talonmies
  • 70,661
  • 34
  • 192
  • 269
dddd
  • 53
  • 3

1 Answers1

2

The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information.

a = torch.nn.Parameter(torch.ones(5, 5))
a = a.cuda()
print(a.requires_grad)
b = a
b = b - 2
print('a ', a)
print('b ', b)
loss = (b - 1).pow(2).sum()

a.retain_grad() # added this line

loss.backward()
print(a.grad)

That happens because of your line a = a.cuda() that override the original value of a.

You could use

a = torch.nn.Parameter(torch.ones(5, 5))
a.cuda()

Or

a = torch.nn.Parameter(torch.ones(5, 5, device='cuda'))
a = torch.nn.Parameter(torch.ones(5, 5).cuda())

Or explicitly requesting to retain the gradients of a

a.retain_grad() # added this line

Erasing the gradients of intermediate variables can save significant amount of memory. So it is good that you retain gradients only where you need.

Bob
  • 13,867
  • 1
  • 5
  • 27
  • I hope you can accept the answer, if you need take a look on the [tour](https://stackoverflow.com/tour) – Bob Sep 05 '22 at 07:53