1

I'd like to compute the first- and second-order derivatives of a neural network with respect to its input. As a toy model, I defined a custom layer with a method called "derivative" that does exactly this:

import torch
from torch import nn
from torch.autograd import grad
import numpy as np

class Exponential(nn.Module):

    def __init__(self):
        super().__init__()
        self.beta = nn.Parameter(nn.init.uniform_(torch.empty(1), 0.1, 2.0), requires_grad=True)

    def forward(self, x):
        return torch.exp(-self.beta*x**2)

    def derivative(self, X):
        # define zero-vectors for first and second order derivatives
        grads_1, grads_2 = torch.zeros_like(X), torch.zeros_like(X)
        # loop over elements of the input vector 
        for i in np.arange(X.shape[0]):
            x = X[i].requires_grad_(True)
            y = self.forward(x).requires_grad_(True)
            # compute first-order derivative
            grads_1[i] = grad(y, x, create_graph=True, retain_graph=True)[0]
            # compute second-order derivative
            grads_2[i] = grad(grads_1[i], x, create_graph=True, retain_graph=True)[0]
        return grads_1, grads_2

I compared against finite differences calculations of the derivatives and the results seem correct. However, computation is very slow! I wonder how would one go about speeding up the calculations (maybe by vectorising the loop?). Such vectorization is possible using "backward" method, e.g. see the answer to this question, however, I could not generlise this method for second-order derivatives. Thanks.

Saleh
  • 169
  • 1
  • 6
  • Maybe you can make it work with autograd Hessian : https://pytorch.org/docs/stable/autograd.html#torch.autograd.functional.hessian . Your function is not exactly scalar, but you can probably make it since it's just a vectorized scalar exponential – trialNerror Feb 09 '21 at 14:41
  • @trialNerror I don't think vectorisation is possible with Hessian. The function is indeed scalar-valued and depends on one variable only. Using Hessian would just compute second derivative in one shot instead of computing grads_1 then grads_2. – Saleh Feb 09 '21 at 15:15

0 Answers0