You can use PyTorch to compute the gradients of a tensor with respect to another tensor under some constraints. If you're careful to stay within the tensor framework to ensure a computation graph is created, then by repeatedly calling backward on each element of the output tensor and zeroing the grad member of the independent variable you can iteratively query the gradient of each entry. This approach allows you to gradually build the gradient of a vector valued function.
Unfortunately this approach requires calling backward
many times which may be slow in practice and may result in very large matrices.
import torch
from copy import deepcopy
def get_gradient(f, x):
""" computes gradient of tensor f with respect to tensor x """
assert x.requires_grad
x_shape = x.shape
f_shape = f.shape
f = f.view(-1)
x_grads = []
for f_val in f:
if x.grad is not None:
x.grad.data.zero_()
f_val.backward(retain_graph=True)
if x.grad is not None:
x_grads.append(deepcopy(x.grad.data))
else:
# in case f isn't a function of x
x_grads.append(torch.zeros(x.shape).to(x))
output_shape = list(f_shape) + list(x_shape)
return torch.cat((x_grads)).view(output_shape)
For example, given the following function:
f(x0,x1,x2) = (x0*x1*x2, x1^2, x0+x2)
The Jacobian at x0, x1, x2 = (1, 2, 3)
could be computed as follows
x = torch.tensor((1.0, 2.0, 3.0))
x.requires_grad_(True) # must be set before further computation
f = torch.stack((x[0]*x[1]*x[2], x[1]**2, x[0]+x[2]))
df_dx = get_gradient(f, x)
print(df_dx)
which results in
tensor([[6., 3., 2.],
[0., 4., 0.],
[1., 0., 1.]])
For your case if you can define an output tensor with respect to an input tensor you could use such a function to compute the gradient.
A useful feature of PyTorch is the ability to compute the vector-Jacobian product. The previous example required lots of re-applications of the chain rule (a.k.a. back propagation) via the backward
method to compute the Jacobian directly. But PyTorch allows you to compute the matrix/vector product of the Jacobian with an arbitrary vector which is much more efficient than actually building the Jacobian. This may be more in line with what you're looking for since you can finagle it to compute multiple gradients at various values of a function, similar to the way I believe numpy.gradient
operates.
For example, here we compute f(x) = x^2 + sqrt(x)
for x = 1, 1.1, ..., 1.8
and compute the derivative (which is f'(x) = 2x + 0.5/sqrt(x)
) at each of these points
dx = 0.1
x = torch.arange(1, 1.8, dx, requires_grad=True)
f = x**2 + torch.sqrt(x)
f.backward(torch.ones(f.shape))
x_grad = x.grad
print(x_grad)
which results in
tensor([2.5000, 2.6767, 2.8564, 3.0385, 3.2226, 3.4082, 3.5953, 3.7835])
Compare this to numpy.gradient
dx = 0.1
x_np = np.arange(1, 1.8, dx)
f_np = x_np**2 + np.sqrt(x_np)
x_grad_np = np.gradient(f_np, dx)
print(x_grad_np)
which results in the following approximation
[2.58808848 2.67722558 2.85683288 3.03885421 3.22284723 3.40847554 3.59547805 3.68929417]