How does pytorch compute derivatives for simple functions?

Question

When we talk about the auto-differentiation in the pytorch, we are usually presented a graphical structures of tensors based on their formulas, and pytorch will compute the gradients by tracing down the graphical tree using chain rules. However, I want to know what will happen at the leaf nodes? Does pytorch hardcode a whole list of basic functions with their analytical derivatives, or does it compute the gradients using numerical methods? A quick example:

import torch

def f(x):
    return x ** 2
x = torch.tensor([1.0], requires_grad=True)
y = f(x)
y.backward()
print(x.grad) # 2.0

In this example, does pytorch compute the derivative by $$ (x^2)' = 2x = 2 * 1 = 2 $$, or does pytorch compute in a way similar to $$ (1.00001^2 - 1^2) / (1.000001 - 1) ~ 2 $$ ?

Thanks!

The first. See `autograd` (automatic differentiation backend for pytorch). I've been meaning to get through these [slides](https://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec10.pdf) and these [lecture notes](http://www.cs.toronto.edu/~rgrosse/courses/csc421_2019/readings/L06%20Automatic%20Differentiation.pdf) and also this ["Automatic differentiation for dummies" presentation by Simon Peyton Jones](https://www.youtube.com/watch?v=FtnkqIsfNQc)... — Mateen Ulhaq, Jul 22 '20 at 04:33
The second method you mention would probably not be a good idea for a deeper network since there's a ton of floating point error that you'd run into, which I presume would be akin to the vanishing gradients problem but worse. — Mateen Ulhaq, Jul 22 '20 at 04:35
So the developers of pytorch would have to hard-code the derivative of a lot of simple functions? — Caprikuarius2, Jul 22 '20 at 04:38
For functions like `f(x) = x**2`, you can just use rules. CAS like Maple (or even [WolframAlpha](https://www.wolframalpha.com/)) allow you to do symbolic differentiation on functions R -> R^n. How do *you* do differentiation? Do you use a table of rules that you've memorized? Or do you use complicated arguments with epsilons and deltas? (I wonder if CASes can handle those too...) For R^m -> R^n, which autodiff deals with, we can also have a [table of derivatives like this](https://github.com/pytorch/pytorch/blob/5c9918e757b019564e74ae6f676cacfe70a87afd/tools/autograd/derivatives.yaml#L805). — Mateen Ulhaq, Jul 22 '20 at 06:06
And here's the [power rule](https://github.com/pytorch/pytorch/blob/80d5b3785b88f83eb598e393d0137f045b979c4b/tools/autograd/templates/Functions.cpp#L158), probably hardcoded in C++ for performance reasons. — Mateen Ulhaq, Jul 22 '20 at 06:08

score 3 · Answer 1 · answered Apr 28 '21 at 22:13

See this paper for exact answer, specifically section 2.1 or figure 2.

In short, PyTorch has a list of basic functions and the expression of their derivatives. So, what is done in your case (y =xx), is evaluating $$ y' = 2x $$.

The numerical method you mentioned is called numerical differentiation or finite differences, and it is an approximation of the derivative. But it is not what PyTorch does.

How does pytorch compute derivatives for simple functions?

1 Answers1