4

When we talk about the auto-differentiation in the pytorch, we are usually presented a graphical structures of tensors based on their formulas, and pytorch will compute the gradients by tracing down the graphical tree using chain rules. However, I want to know what will happen at the leaf nodes? Does pytorch hardcode a whole list of basic functions with their analytical derivatives, or does it compute the gradients using numerical methods? A quick example:

import torch

def f(x):
    return x ** 2
x = torch.tensor([1.0], requires_grad=True)
y = f(x)
y.backward()
print(x.grad) # 2.0

In this example, does pytorch compute the derivative by $$ (x^2)' = 2x = 2 * 1 = 2 $$, or does pytorch compute in a way similar to $$ (1.00001^2 - 1^2) / (1.000001 - 1) ~ 2 $$ ?

Thanks!

Caprikuarius2
  • 137
  • 1
  • 7
  • The first. See `autograd` (automatic differentiation backend for pytorch). I've been meaning to get through these [slides](https://www.cs.toronto.edu/~rgrosse/courses/csc321_2018/slides/lec10.pdf) and these [lecture notes](http://www.cs.toronto.edu/~rgrosse/courses/csc421_2019/readings/L06%20Automatic%20Differentiation.pdf) and also this ["Automatic differentiation for dummies" presentation by Simon Peyton Jones](https://www.youtube.com/watch?v=FtnkqIsfNQc)... – Mateen Ulhaq Jul 22 '20 at 04:33
  • 1
    The second method you mention would probably not be a good idea for a deeper network since there's a ton of floating point error that you'd run into, which I presume would be akin to the vanishing gradients problem but worse. – Mateen Ulhaq Jul 22 '20 at 04:35
  • 1
    So the developers of pytorch would have to hard-code the derivative of a lot of simple functions? – Caprikuarius2 Jul 22 '20 at 04:38
  • For functions like `f(x) = x**2`, you can just use rules. CAS like Maple (or even [WolframAlpha](https://www.wolframalpha.com/)) allow you to do symbolic differentiation on functions R -> R^n. How do *you* do differentiation? Do you use a table of rules that you've memorized? Or do you use complicated arguments with epsilons and deltas? (I wonder if CASes can handle those too...) For R^m -> R^n, which autodiff deals with, we can also have a [table of derivatives like this](https://github.com/pytorch/pytorch/blob/5c9918e757b019564e74ae6f676cacfe70a87afd/tools/autograd/derivatives.yaml#L805). – Mateen Ulhaq Jul 22 '20 at 06:06
  • And here's the [power rule](https://github.com/pytorch/pytorch/blob/80d5b3785b88f83eb598e393d0137f045b979c4b/tools/autograd/templates/Functions.cpp#L158), probably hardcoded in C++ for performance reasons. – Mateen Ulhaq Jul 22 '20 at 06:08
  • That makes a lot more sense. Thank you so much! – Caprikuarius2 Jul 22 '20 at 08:01

1 Answers1

3

See this paper for exact answer, specifically section 2.1 or figure 2.

In short, PyTorch has a list of basic functions and the expression of their derivatives. So, what is done in your case (y =xx), is evaluating $$ y' = 2x $$.

The numerical method you mentioned is called numerical differentiation or finite differences, and it is an approximation of the derivative. But it is not what PyTorch does.