Does tensorflow use automatic or symbolic gradients?

Question

I haven't been able to find a clear statement of whether tensorflow uses automatic or symbolic differentiation.

I skimmed the tensorflow paper and they mention automatic gradients, but it is unclear if they just mean symbolic gradients, as they also mention that it has that capability.

It uses technique called reverse mode automatic differentiation — Yaroslav Bulatov, Apr 02 '16 at 09:52
The source is at tensorflow/python/ops/gradients.py, mainly the "gradients()" function — Yaroslav Bulatov, Apr 02 '16 at 12:30

score 41 · Answer 1 · answered Jul 16 '17 at 22:36

TF uses automatic differentiation and more specifically reverse-mode auto differentiation.

There are 3 popular methods to calculate the derivative:

Numerical differentiation
Symbolic differentiation
Automatic differentiation

Numerical differentiation relies on the definition of the derivative: , where you put a very small h and evaluate function in two places. This is the most basic formula and on practice people use other formulas which give smaller estimation error. This way of calculating a derivative is suitable mostly if you do not know your function and can only sample it. Also it requires a lot of computation for a high-dim function.

Symbolic differentiation manipulates mathematical expressions. If you ever used matlab or mathematica, then you saw something like this

Here for every math expression they know the derivative and use various rules (product rule, chain rule) to calculate the resulting derivative. Then they simplify the end expression to obtain the resulting expression.

Automatic differentiation manipulates blocks of computer programs. A differentiator has the rules for taking the derivative of each element of a program (when you define any op in core TF, you need to register a gradient for this op). It also uses chain rule to break complex expressions into simpler ones. Here is a good example how it works in real TF programs with some explanation.

You might think that Automatic differentiation is the same as Symbolic differentiation (in one place they operate on math expression, in another on computer programs). And yes, they are sometimes very similar. But for control flow statements (`if, while, loops) the results can be very different:

symbolic differentiation leads to inefficient code (unless carefully done) and faces the difficulty of converting a computer program into a single expression

For example, while your symbolic computation may heavily rely on quantities you computed before anyways, it wouldn't be smart enough to reuse them and just call them again every time they appear in the equation. Like with csc(x), or cos(x-7), that we would already have from a forward computation, above. — Karaszka, May 21 '19 at 15:14
It might have been useful to share the logic behind automatic differentiation, instead of stating it's different form the other two. — Sergey Bushmanov, Aug 10 '21 at 20:26

score 34 · Answer 2 · edited Oct 18 '17 at 15:08

34

By "automatic differentiation" you may be thinking of "differentiation by finite differences" where you approximate derivative of f(x) as [f(x+e)-f(x-e)]/(2e). However, automatic differentiation is different and the finite difference method is an example of "numerical differentiation".

TensorFlow uses reverse mode automatic differentiation for it's gradients operation and finite difference method for tests that check validity of gradient operation like here.

Finite difference method is not practical for high dimensional problems, whereas reverse mode automatic differentiation gives you derivative of a "many->1" function at roughly the same cost as computing original function

You can see the core of the implementation here.

Implementation of differentiation method is symbolic in a sense that gradients operation takes computational graph and produces computational graph that can be fed back into "gradients" operation to get higher level derivatives.

Here's an example

tf.reset_default_graph()
x = tf.Variable(0.)
y = tf.square(x)
z = tf.gradients([y], [x])

Here's the graph you get

There are some extra operators since same graph code will work for higher dimensional x, but the point is that you see both x^2 and 2*x in the same graph.

edited Oct 18 '17 at 15:08

Hari

1,561
4
17
26

answered Apr 02 '16 at 12:25

Yaroslav Bulatov

57,332
22
139
197

18

Automatic differentiation is not differentiation using finite differences... it's application of the chain rule. – user541686 Jul 27 '16 at 03:25
1

There's a "Square" op, that's your x^2. x*2 is done with `mul` op – Yaroslav Bulatov Nov 17 '16 at 23:59
4

"Automatic differentiation is not symbolic differentiation, nor numerical differentiation (the method of finite differences)." https://en.wikipedia.org/wiki/Automatic_differentiation – pstjohn Apr 18 '17 at 16:03
Your link to the implementation is no longer valid. It would be helpful if someone could update this to link to the specific revision which contained this code. – naasking May 14 '20 at 19:31
Automatic differentiation is not finite difference. Very helpful. What is it? You'd better read https://en.wikipedia.org/wiki/Automatic_differentiation – Sergey Bushmanov Aug 10 '21 at 20:32

score 2 · Answer 3 · answered Jul 07 '17 at 05:55

Afaik symbolic differentiation means working with a mathematical, symbolic equation (i.e., symbolic math equation in, derivative of the equation out). Automatic differentiation computes derivatives based on computational functions (which in turn are broken down into basic operations such as addition/subtraction and multipliation/division).

Since TensorFlow does differentiation based on a computation graph of operations, I'd intuitively say that it's automatic differentiation (I don't know of any other technique that would be appropriate here; I think the possibility that TensorFlow is converting the computation graph into a mathematical equation that is then parsed to compute the derivative of that equation is prob. out of question). The authors say "symbolic differentiation" in the TensorFlow whitepaper though -- however, I think this may be a misnomer similar to "Tensor" instead of "(multi-dimensional) data array" if you'd ask a mathematician.

Does tensorflow use automatic or symbolic gradients?

3 Answers3

Linked