Questions tagged [automatic-differentiation]

Also known as algorithmic differentiation, short AD. Techniques that take a procedure evaluating a numerical function and transform it into a procedure that additionally evaluates directional derivatives, gradients, higher order derivatives.

Also known as algorithmic differentiation, short AD. Techniques that take a procedure evaluating a numerical function and transform it into a procedure that additionally evaluates directional derivatives, gradients, higher order derivatives.

Techniques include operator

  • overloading for dual numbers,
  • operator overloading to extract the operations sequence as a tape,
  • code analysis and transformation.

For a function with input of dimension n and output of dimension n, requiring L elementary operations for its evaluation, one directional derivative or one gradient can be computed with 3*L operations.

The accuracy of the derivative is, automatically, nearly as good as the accuracy of the function evaluation.

Other differentiation method are

  • symbolic differentiation, where the expanded expression for the derivatives is obtained first, which can be large depending on the implementation, and
  • numerical differentiation by divided differences, which provides less accuracy with comparable effort, or comparable accuracy with a higher effort.

See wikipedia and autodiff.org

192 questions
78
votes
2 answers

What does the parameter retain_graph mean in the Variable's backward() method?

I'm going through the neural transfer pytorch tutorial and am confused about the use of retain_variable(deprecated, now referred to as retain_graph). The code example show: class ContentLoss(nn.Module): def __init__(self, target, weight): …
59
votes
3 answers

Why don't C++ compilers do better constant folding?

I'm investigating ways to speed up a large section of C++ code, which has automatic derivatives for computing jacobians. This involves doing some amount of work in the actual residuals, but the majority of the work (based on profiled execution time)…
36
votes
3 answers

Difference between symbolic differentiation and automatic differentiation?

I just cannot seem to understand the difference. For me it looks like both just go through an expression and apply the chain rule.. What am I missing?
Moody
  • 1,297
  • 2
  • 12
  • 21
16
votes
1 answer

How to get more performance out of automatic differentiation?

I am having a hard time optimizing a program that is relying on ads conjugateGradientDescent function for most of it's work. Basically my code is a translation of an old papers code that is written in Matlab and C. I have not measured it, but that…
fho
  • 6,787
  • 26
  • 71
14
votes
7 answers

Automatic differentiation library in Scheme / Common Lisp / Clojure

I've heard that one of McCarthy's original motivations for inventing Lisp was to write a system for automatic differentiation. Despite this, my Google searches haven't yielded any libraries/macros for doing this. Are there any Scheme/Common…
14
votes
2 answers

how is backpropagation the same (or not) as reverse automatic differentiation?

The Wikipedia page for backpropagation has this claim: The backpropagation algorithm for calculating a gradient has been rediscovered a number of times, and is a special case of a more general technique called automatic differentiation in the…
12
votes
4 answers

Is there any working implementation of reverse mode automatic differentiation for Haskell?

The closest-related implementation in Haskell I have seen is the forward mode at http://hackage.haskell.org/packages/archive/fad/1.0/doc/html/Numeric-FAD.html. The closest related related research appears to be reverse mode for another functional…
Ian Fiske
  • 10,482
  • 3
  • 21
  • 20
11
votes
1 answer

Optimize a list function that creates too much garbage (not stack overflow)

I have that Haskell function, that's causing more than 50% of all the allocations of my program, causing 60% of my run time to be taken by the GC. I run with a small stack (-K10K) so there is no stack overflow, but can I make this function faster,…
10
votes
1 answer

Navigating the automatic differentiation ecosystem in Julia

Julia has a somewhat sprawling AD ecosystem, with perhaps by now more than a dozen different packages spanning, as far as I can tell, forward-mode (ForwardDiff.jl, ForwardDiff2.jl ), reverse-mode (ReverseDiff.jl, Nabla.jl, AutoGrad.jl), and…
cbk
  • 4,225
  • 6
  • 27
10
votes
1 answer

how to apply gradients manually in pytorch

Starting to learn pytorch and was trying to do something very simple, trying to move a randomly initialized vector of size 5 to a target vector of value [1,2,3,4,5]. But my distance is not decreasing!! And my vector x just goes crazy. No idea what I…
9
votes
1 answer

How does tensorflow handle non differentiable nodes during gradient calculation?

I understood the concept of automatic differentiation, but couldn't find any explanation how tensorflow calculates the error gradient for non differentiable functions as for example tf.where in my loss function or tf.cond in my graph. It works just…
Natjo
  • 2,005
  • 29
  • 75
8
votes
3 answers

Where is Wengert List in TensorFlow?

TensorFlow use reverse-mode automatic differentiation(reverse mode AD), as shown in https://github.com/tensorflow/tensorflow/issues/675. Reverse mode AD need a data structure called a Wengert List - see…
7
votes
2 answers

Numeric.AD and typing problem

I'm trying to work with Numeric.AD and a custom Expr type. I wish to calculate the symbolic gradient of user inputted expression. The first trial with a constant expression works nicely: calcGrad0 :: [Expr Double] calcGrad0 = grad df vars where …
aleator
  • 4,436
  • 20
  • 31
7
votes
3 answers

What is differentiable programming?

Native support for differential programming has been added to Swift for the Swift for Tensorflow project. Julia has similar with Zygote. What exactly is differentiable programming? what does it enable? Wikipedia says the programs can be…
joel
  • 6,359
  • 2
  • 30
  • 55
7
votes
1 answer

Pytorch Autograd: what does runtime error "grad can be implicitly created only for scalar outputs" mean

I am trying to understand Pytorch autograd in depth; I would like to observe the gradient of a simple tensor after going through a sigmoid function as below: import torch from torch import autograd D = torch.arange(-8, 8, 0.1,…
1
2 3
12 13