5

I've drawn samples from a multivariate normal distribution and would like to get the gradient of their log probability with respect to the mean. Since there are many samples, this requires a Jacobian:

import torch

mu = torch.ones((2,), requires_grad=True)
sigma = torch.eye(2)
dist = torch.distributions.multivariate_normal.MultivariateNormal(mu, sigma)

num_samples=10
samples = dist.sample((num_samples,))
logprobs = dist.log_prob(samples)

Now I would like to get the derivative of each entry in logprobs with respect to each entry in mu.

A simple solution is a python loop:

grads = []
for logprob in logprobs:
    grad = torch.autograd.grad(logprob, mu, retain_graph=True)
    grads.append(grad)

If you stack the grads, the result is the desired Jacobian. Is there also built-in and vectorized support for this?

Related questions/internet ressources:

This is a huge topic, there are lots of related posts. Nevertheless, I think that this specific question (regarding distributions) hasn't been answered yet:

  • This question is basically the same as mine (but without example code and solution attempt), sadly it is unanswered: Pytorch custom function jacobian gradient

  • This question shows the calculation of a jacobian in pytorch, but I don't think the solution is applicable to my problem: Pytorch most efficient Jacobian/Hessian calculation It requires stacking the input in a way that seems incompatible with the distribution. I couldn't make it work.

  • This gist has some code snippets for Jacobians. In principle they are similar to the approach from the question above.

lhk
  • 27,458
  • 30
  • 122
  • 201
  • 1
    Don't think it is straight forward possible as torch implements reverse-mode AD and hence expects output to be scalar (in your case it is vector). You can try making \mu as (10,2) by replication, and so grad may give you right jacobian – Umang Gupta Oct 30 '19 at 23:57
  • I agree that there probably isn't a straightforward way to get the Jacobian using pytorch's autograd without a loop. That said, in this particular case your Jacobian has a relatively simple closed form which is `grads = torch.mm(samples - mu, torch.inverse(sigma))` (you can use `torch.solve` for a more stable version of this which doesn't directly invert). Depending on the complexity of your actual problem perhaps hard coding an expression for the Jacobian might be the way to go? – jodag Oct 31 '19 at 02:39
  • 1
    @jodag, you're right, here the jacobian is easy to compute. But this is the minimum viable example for stackoverflow. the real code is much more complicated. – lhk Oct 31 '19 at 12:41

1 Answers1

2

PyTorch 1.5.1 introduced a torch.autograd.functional.jacobian function. This computes the Jacobians of a function w.r.t. the input tensors. Since jacobian requires a python function as the first argument, using it requires some code restructuring.

import torch

torch.manual_seed(0)    # for repeatable results

mu = torch.ones((2,), requires_grad=True)
sigma = torch.eye(2)
num_samples = 10

def f(mu):
    dist = torch.distributions.multivariate_normal.MultivariateNormal(mu, sigma)
    samples = dist.sample((num_samples,))
    logprobs = dist.log_prob(samples)
    return logprobs

grads = torch.autograd.functional.jacobian(f, mu)

print(grads)
tensor([[-1.1258, -1.1524],
        [-0.2506, -0.4339],
        [ 0.5988, -1.5551],
        [-0.3414,  1.8530],
        [ 0.4681, -0.1577],
        [ 1.4437,  0.2660],
        [ 1.3894,  1.5863],
        [ 0.9463, -0.8437],
        [ 0.9318,  1.2590],
        [ 2.0050,  0.0537]])
jodag
  • 19,885
  • 5
  • 47
  • 66