PyTorch most efficient Jacobian/Hessian calculation

Question

I am looking for the most efficient way to get the Jacobian of a function through Pytorch and have so far come up with the following solutions:

# Setup
def func(X):
    return torch.stack((X.pow(2).sum(1),
                        X.pow(3).sum(1),
                        X.pow(4).sum(1)),1)  

X = Variable(torch.ones(1,int(1e5))*2.00094, requires_grad=True).cuda()

# Solution 1:
t = time()
Y = func(X)
J = torch.zeros(3, int(1e5))

for i in range(3):
    J[i] = grad(Y[0][i], X, create_graph=True, retain_graph=True, allow_unused=True)[0]

print(time()-t)
>>> Output: 0.002 s

# Solution 2:
def Jacobian(f,X):
    X_batch = Variable(X.repeat(3,1), requires_grad=True)
    f(X_batch).backward(torch.eye(3).cuda(), retain_graph=True)
    return X_batch.grad

t = time()
J2 = Jacobian(func,X)
print(time()-t)
>>> Output: 0.001 s

Since there seem to be not a big difference between using a loop in the first solution than the second one, I wanted to ask if there might still be be a faster way to calculate a Jacobian in pytorch.

My other question is then also about what might be the most efficient way to calculate the Hessian.

Finally, does anyone know if something like this can be done easier or more efficient in TensorFlow?

score 3 · Answer 1 · answered Mar 30 '22 at 06:15

functorch can speed up computations even more. E.g., this code is from the functorch docs for batched Jacobian calculation (Hessian works too):

batch_size = 64
Din = 31
Dout = 33

weight = torch.randn(Dout, Din)
print(f"weight shape = {weight.shape}")
bias = torch.randn(Dout)

def predict(weight, bias, x):
    return F.linear(x, weight, bias).tanh()

x = torch.randn(batch_size, Din)
compute_batch_jacobian = vmap(jacrev(predict, argnums=2), in_dims=(None, None, 0))
batch_jacobian0 = compute_batch_jacobian(weight, bias, x)

iacob · Answer 2 · 2021-03-31T12:30:49.397

2

The most efficient method is likely to use PyTorch's own inbuilt functions:

torch.autograd.functional.jacobian(func, x)
torch.autograd.functional.hessian(func, x)

edited Mar 31 '21 at 12:30

answered Mar 31 '21 at 09:51

iacob

20,084
6
92
119

Ricoter · Answer 3 · 2019-07-29T12:01:22.240

1

I had a similar problem which I solved by defining the Jacobian manually (calculating the derivatives by hand). For my problem this was feasible, but I can imagine that is not always the case. The computation time speeds up some factors on my machine (cpu), compared to the second solution.

# Solution 2
def Jacobian(f,X):
    X_batch = Variable(X.repeat(3,1), requires_grad=True)
    f(X_batch).backward(torch.eye(3).cuda(),  retain_graph=True)
    return X_batch.grad

%timeit Jacobian(func,X)
11.7 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# Solution 3
def J_func(X):
    return torch.stack(( 
                 2*X,
                 3*X.pow(2),
                 4*X.pow(3)
                  ),1)

%timeit J_func(X)
539 µs ± 24.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

edited Jul 29 '19 at 12:01

answered Jun 14 '19 at 22:00

Ricoter

665
5
17

But if instead we have a neural network it wouldn't be possible to write the jacobian manually – Alejandro Aug 09 '19 at 11:24
2

That is not the question and you are wrong. A neural network is a function that consists of math operations, so you can just write the Jacobian function manually. I used this in my research and this is by far the most efficient way I found to compute many Jacobians while training. – Ricoter Oct 24 '19 at 13:39

PyTorch most efficient Jacobian/Hessian calculation

3 Answers3

Linked