2

I want to use PyTorch to get the partial derivatives between output and input. Suppose I have a function Y = 5*x1^4 + 3*x2^3 + 7*x1^2 + 9*x2 - 5, and I train a network to replace this function, then I use autograd to calculate dYdx1, dYdx2:

net = torch.load('net_723.pkl')
x = torch.tensor([[1,-1]],requires_grad=True).type(torch.FloatTensor)
y = net(x)
grad_c = torch.autograd.grad(y,x,create_graph=True,retain_graph=True)[0] 

Then I get a wrong derivative as:

>>>tensor([[ 7.5583, -5.3173]])

but when I use function to calculate, I get the right answer:

Y = 5*x[0,0]**4 + 3*x[0,1]**3 + 7*x[0,0]**2 + 9*x[0,1] - 5
grad_c = torch.autograd.grad(Y,x,create_graph=True,retain_graph=True)[0]
>>>tensor([[ 34.,  18.]])

Why does this happen?

Mihai Chelaru
  • 7,614
  • 14
  • 45
  • 51
upc_lihao
  • 21
  • 1
  • 2

1 Answers1

2

A neural network is a universal function approximator. What that means is, that, for enough computational resources, training time, nodes, etc., you can approximate any function.
Without any further information on how you trained your network in the first example, I would suspect that your network simply does not fit properly to the underlying function, meaning that the internal representation of your network actually models a different function!

For the second code snippet, autmatic differentiation does give you the exact partial derivative. It does so via a different method, see another one of my answers on SO, on the topic of AutoDiff/Autograd specifically.

dennlinger
  • 9,890
  • 1
  • 42
  • 63
  • thx for your reply, the net I trained has four layers and 200 nodes every layer, the training data is some random number between -1 and 1, and I use math function to calculate the target. The MSE loss is approximately 3E-4. I thought this fit was correct but I got the wrong answer through autograd. – upc_lihao Jul 25 '18 at 00:35
  • *How many* samples are you passing to the training? Also, you can try plotting a function generated by your network against the ground truth (`np.linspace` is your friend here, then just evaluate your network on that). With that high number of nodes per layer, I suspect that you will completely overfit the given data points, and have a high deviation (due to high complexity) outside of that range. A visual example is given on [Wikipedia](https://en.wikipedia.org/wiki/Overfitting) – dennlinger Jul 25 '18 at 06:37
  • Thank you for you reply, you opinion is similar to my teacher. I use np.linspace to generate 5000 data point, it may be too small. Do you think my way to calculate the derivative between output and input will get correct answer if the network fit is good enough? – upc_lihao Jul 25 '18 at 07:04
  • Well, depending on what your ultimate goal with this is (I would personally always use an explicit representation, if it is available, such as in your case). I would rather suggest to shrink the size of the network (try it with only 32 nodes per layer, or potentially even less. This is enough to describe such a function "good enough" IMO, and you can save yourself some training time. Alsom make sure to shuffle your data during training, if you're not already doing that. – dennlinger Jul 25 '18 at 07:07