I am having trouble understanding the conceptual meaning of the grad_outputs
option in torch.autograd.grad
.
The documentation says:
grad_outputs
should be a sequence of length matching output containing the “vector” in Jacobian-vector product, usually the pre-computed gradients w.r.t. each of the outputs. If an output doesn’trequire_grad
, then the gradient can beNone
).
I find this description quite cryptic. What exactly do they mean by Jacobian-vector product? I know what the Jacobian is, but not sure about what product they mean here: element-wise, matrix product, something else? I can't tell from my example below.
And why is "vector" in quotes? Indeed, in the example below I get an error when grad_outputs
is a vector, but not when it is a matrix.
>>> x = torch.tensor([1.,2.,3.,4.], requires_grad=True)
>>> y = torch.outer(x, x)
Why do we observe the following output; how was it computed?
>>> y
tensor([[ 1., 2., 3., 4.],
[ 2., 4., 6., 8.],
[ 3., 6., 9., 12.],
[ 4., 8., 12., 16.]], grad_fn=<MulBackward0>)
>>> torch.autograd.grad(y, x, grad_outputs=torch.ones_like(y))
(tensor([20., 20., 20., 20.]),)
However, why this error?
>>> torch.autograd.grad(y, x, grad_outputs=torch.ones_like(x))
RuntimeError: Mismatch in shape:
grad_output[0]
has a shape oftorch.Size([4])
andoutput[0]
has a shape oftorch.Size([4, 4])
.