I am trying to compute multiple loss gradients efficiently (without a for loop) in PyTorch. Given:
import torch
from torch import nn
class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Sequential(
nn.Linear(input_size, 16, bias=False),
nn.Linear(16, output_size, bias=False),
)
def forward(self, x):
return self.linear(x)
device = "cpu"
input_size = 2
output_size = 2
x = torch.randn(10, 1, input_size).to(device)
y = torch.randn(10, 1, output_size).to(device)
model = NeuralNetwork().to(device)
loss_fn = nn.MSELoss()
def loss_grad(x, label):
y = model(x)
loss = loss_fn(y, label)
grads = torch.autograd.grad(loss, model.parameters(), retain_graph=True)
return grads
The following works, but uses a for loop:
# inefficient but works
def compute_for():
grads = [loss_grad(x[i], y[i]) for i in range(x.shape[0])]
print(grads)
compute_for()
For efficiency, I tried using torch.vmap
instead:
# potentially more efficient but doesn't work
def compute_vmap():
grads = torch.vmap(loss_grad)(x, y)
print(grads)
compute_vmap()
I was expecting it to compute the gradients of the losses w.r.t. the parameters for each element in x, y
. Instead, I get an error:
RuntimeError: element 0 of tensors does not require grad
As I understand, this means that elements from the tensor x
will be computed and they don't individually require grad.
How can I modify this code so that it computes all gradients? Or is there another method to do that?