6

I have a neural network that's computing a vector quantity u. I'd like to compute first and second-order jacobians with respect to the input x, a single element.

Would anybody know how to do that in PyTorch? Below, the code snippet from my project:

import torch
import torch.nn as nn

class PINN(torch.nn.Module):
    
    def __init__(self, layers:list):
        super(PINN, self).__init__()
        self.linears = nn.ModuleList([])
        for i, dim in enumerate(layers[:-2]):
            self.linears.append(nn.Linear(dim, layers[i+1]))
            self.linears.append(nn.ReLU())
        self.linears.append(nn.Linear(layers[-2], layers[-1]))
        
    def forward(self, x):
        for layer in self.linears:
            x = layer(x)
        return x

I then instantiate my network:

n_in = 1
units = 50
q = 500

pinn = PINN([n_in, units, units, units, q+1])
pinn

Which returns

PINN(
  (linears): ModuleList(
    (0): Linear(in_features=1, out_features=50, bias=True)
    (1): ReLU()
    (2): Linear(in_features=50, out_features=50, bias=True)
    (3): ReLU()
    (4): Linear(in_features=50, out_features=50, bias=True)
    (5): ReLU()
    (6): Linear(in_features=50, out_features=501, bias=True)
  )
)

Then I compute both FO and SO jacobians

x = torch.randn(1, requires_grad=False)

u_x = torch.autograd.functional.jacobian(pinn, x, create_graph=True)
print("First Order Jacobian du/dx of shape {}, and features\n{}".format(u_x.shape, u_x)

u_xx = torch.autograd.functional.jacobian(lambda _: u_x, x)
print("Second Order Jacobian du_x/dx of shape {}, and features\n{}".format(u_xx.shape, u_xx)

Returns

First Order Jacobian du/dx of shape torch.Size([501, 1]), and features
tensor([[-0.0310],
        [ 0.0139],
        [-0.0081],
        [-0.0248],
        [-0.0033],
        [ 0.0013],
        [ 0.0040],
        [ 0.0273],
        ...
        [-0.0197]], grad_fn=<ViewBackward>)
Second Order Jacobian du/dx of shape torch.Size([501, 1, 1]), and features
tensor([[[0.]],

        [[0.]],

        [[0.]],

        [[0.]],

        ...

        [[0.]]])

Should not u_xx be a None vector if it didn't depend on x?

Thanks in advance

iacob
  • 20,084
  • 6
  • 92
  • 119
alxyok
  • 176
  • 6
  • Have you checked torch.autograd.functional.jacobian (https://pytorch.org/docs/stable/autograd.html)? – Laurent Bristiel Nov 24 '20 at 06:24
  • I have, and I manage to get the first order jacobian effortlessly, but I don't know how to compute the second order. I get a `0`-valued , which is strange considering the network I have. Is there a way to visualize the graph generated by PyTorch on the fly? – alxyok Nov 24 '20 at 08:07
  • 2
    Considering your network is composed of linear and relu, both of which have constant gradients almost everywhere it's not really a surprise that the second derivative of the output w.r.t. the input is zero everywhere. That seems correct to me. – jodag Nov 24 '20 at 19:36
  • Yeah, I actually figured that out, but haven't taken time to answer myself! Thanks anyway @jodag. I replaced those with a `nn.Tanh` and got to the second order. Still, I find keeping track of what's populated in the graph quite hard. Computing the jacobian twice is tricky. For some reason, `torch.autograd.functional.jacobian` adds useless dimensions. I have not figured that one out yet. – alxyok Nov 24 '20 at 20:24
  • 1
    @aixyok A generalized Jacobian should have shape (output shape x input shape) so the first Jacobian is (501 x 1) because your input x is size 1 and output pinn is size 501. The second order Jacobain (aka the Hessian) will then be (501 x 1 x 1) since the output u_x is size 501 x 1 and the input x is size 1. If you want you can just `.flatten()` u_x before computing the second order Jacobian. – jodag Nov 24 '20 at 22:28
  • Does this answer your question? [PyTorch most efficient Jacobian/Hessian calculation](https://stackoverflow.com/questions/56480578/pytorch-most-efficient-jacobian-hessian-calculation) – iacob Apr 04 '21 at 11:06

2 Answers2

2

So as @jodag mentioned in his comment, ReLU being null or linear, its gradient is constant (except on 0, which is a rare event), so its second-order derivative is zero. I changed the activation function to Tanh, which finally allows me to compute the jacobian twice.

Final code is

import torch
import torch.nn as nn

class PINN(torch.nn.Module):
    
    def __init__(self, layers:list):
        super(PINN, self).__init__()
        self.linears = nn.ModuleList([])
        for i, dim in enumerate(layers[:-2]):
            self.linears.append(nn.Linear(dim, layers[i+1]))
            self.linears.append(nn.Tanh())
        self.linears.append(nn.Linear(layers[-2], layers[-1]))
        
    def forward(self, x):
        for layer in self.linears:
            x = layer(x)
        return x
        
    def compute_u_x(self, x):
        self.u_x = torch.autograd.functional.jacobian(self, x, create_graph=True)
        self.u_x = torch.squeeze(self.u_x)
        return self.u_x
    
    def compute_u_xx(self, x):
        self.u_xx = torch.autograd.functional.jacobian(self.compute_u_x, x)
        self.u_xx = torch.squeeze(self.u_xx)
        return self.u_xx

Then calling compute_u_xx(x) on an instance of PINN with x.require_grad set to True gets me there. How to get rid of useless dimensions introduced by torch.autograd.functional.jacobian remains to be understood though...

alxyok
  • 176
  • 6
2

The second order Jacobian is known as the Hessian and can be computed easily using PyTorch's builtin functions:

torch.autograd.functional.hessian(func, inputs)
iacob
  • 20,084
  • 6
  • 92
  • 119