Using automatic differentiation libraries to compute partial derivatives of an arbitrary tensor

Question

(Note: this is not a question about back-propagation.) I am trying so solve on a GPU a non-linear PDE using PyTorch tensors in place of Numpy arrays. I want to calculate the partial derivatives of an arbitrary tensor, akin to the action of the center finite-difference numpy.gradient function. I have other ways around this problem, but since I am already using PyTorch, I'm wondering if it is possible use the autograd module (or, in general, any other autodifferentiation module) to perform this action.

I have created a tensor-compatible version of the numpy.gradient function - which runs a lot faster. But perhaps there is a more elegant way of doing this. I can't find any other sources that address this question, either to show that it's possible or impossible; perhaps this reflects my ignorance with the autodifferentiation algorithms.

score 5 · Accepted Answer · answered Jan 29 '20 at 05:52

I've had this same question myself: when numerically solving PDEs, we need access to spatial gradients (which the numpy.gradients function can give us) all the time - could it be possible to use automatic differentiation to compute the gradients, instead of using finite-difference or some flavor of it?

"I'm wondering if it is possible use the autograd module (or, in general, any other autodifferentiation module) to perform this action."

The answer is no: as soon as you discretize your problem in space or time, then time and space become discrete variables with a grid-like structure, and are not explicit variables which you feed into some function to compute the solution to the PDE.

For example, if I wanted to compute the velocity field of some fluid flow u(x,t), I would discretize in space and time, and I would have u[:,:] where the indices represent positions in space and time.

Automatic differentiation can compute the derivative of a function u(x,t). So why can't it compute the spatial or time derivative here? Because you've discretized your problem. This means you don't have a function for u for arbitrary x, but rather a function of u at some grid points. You can't differentiate automatically with respect to the spacing of the grid points.

As far as I can tell, the tensor-compatible function you've written is probably your best bet. You can see that a similar question has been asked in the PyTorch forums here and here. Or you could do something like

dx = x[:,:,1:]-x[:,:,:-1]

if you're not worried about the endpoints.

jodag · Answer 2 · 2019-10-07T10:58:04.590

You can use PyTorch to compute the gradients of a tensor with respect to another tensor under some constraints. If you're careful to stay within the tensor framework to ensure a computation graph is created, then by repeatedly calling backward on each element of the output tensor and zeroing the grad member of the independent variable you can iteratively query the gradient of each entry. This approach allows you to gradually build the gradient of a vector valued function.

Unfortunately this approach requires calling backward many times which may be slow in practice and may result in very large matrices.

import torch
from copy import deepcopy

def get_gradient(f, x):
    """ computes gradient of tensor f with respect to tensor x """
    assert x.requires_grad

    x_shape = x.shape
    f_shape = f.shape
    f = f.view(-1)

    x_grads = []
    for f_val in f:
        if x.grad is not None:
            x.grad.data.zero_()
        f_val.backward(retain_graph=True)
        if x.grad is not None:
            x_grads.append(deepcopy(x.grad.data))
        else:
            # in case f isn't a function of x
            x_grads.append(torch.zeros(x.shape).to(x))
    output_shape = list(f_shape) + list(x_shape)
    return torch.cat((x_grads)).view(output_shape)

For example, given the following function:

f(x0,x1,x2) = (x0*x1*x2, x1^2, x0+x2)

The Jacobian at x0, x1, x2 = (1, 2, 3) could be computed as follows

x = torch.tensor((1.0, 2.0, 3.0))
x.requires_grad_(True)   # must be set before further computation

f = torch.stack((x[0]*x[1]*x[2], x[1]**2, x[0]+x[2]))

df_dx = get_gradient(f, x)

print(df_dx)

which results in

tensor([[6., 3., 2.],
        [0., 4., 0.],
        [1., 0., 1.]])

For your case if you can define an output tensor with respect to an input tensor you could use such a function to compute the gradient.

A useful feature of PyTorch is the ability to compute the vector-Jacobian product. The previous example required lots of re-applications of the chain rule (a.k.a. back propagation) via the backward method to compute the Jacobian directly. But PyTorch allows you to compute the matrix/vector product of the Jacobian with an arbitrary vector which is much more efficient than actually building the Jacobian. This may be more in line with what you're looking for since you can finagle it to compute multiple gradients at various values of a function, similar to the way I believe numpy.gradient operates.

For example, here we compute f(x) = x^2 + sqrt(x) for x = 1, 1.1, ..., 1.8 and compute the derivative (which is f'(x) = 2x + 0.5/sqrt(x)) at each of these points

dx = 0.1
x = torch.arange(1, 1.8, dx, requires_grad=True)
f = x**2 + torch.sqrt(x)

f.backward(torch.ones(f.shape))
x_grad = x.grad

print(x_grad)

which results in

tensor([2.5000, 2.6767, 2.8564, 3.0385, 3.2226, 3.4082, 3.5953, 3.7835])

Compare this to numpy.gradient

dx = 0.1
x_np = np.arange(1, 1.8, dx)
f_np = x_np**2 + np.sqrt(x_np)

x_grad_np = np.gradient(f_np, dx)

print(x_grad_np)

which results in the following approximation

[2.58808848 2.67722558 2.85683288 3.03885421 3.22284723 3.40847554 3.59547805 3.68929417]

I don't think this answer is very relevant to the question, since I'm trying to compute the gradient of an _arbitrary_ tensor of numbers that don't necessarily have a nice analytic expression. — BenjaminDSmith, Feb 24 '20 at 21:39
I misunderstood the original question. I assumed that you had derived your arbitrary tensor through some sequence of differentiable operations and wanted the gradient w.r.t. the independent variables. Since you didn't then the answer by @NickMcGreivy is correct. The gradient of a function is defined as a limit over progressively smaller regions around the sample, once you discretize your domain you're discarding the information necessary to compute it. In fact there are infinitely many differentiable functions which could have generated your tensor and the true gradient could be anything. — jodag, Feb 29 '20 at 19:56

Using automatic differentiation libraries to compute partial derivatives of an arbitrary tensor

2 Answers2

Linked