1

I have begun to learn TensorFlow with the official guide : https://www.tensorflow.org/guide.

My comprehension is struggling with a part of the guide named "Automatic differentiation" and especially "Took gradients through a stateful object".

I don't understand why they said that stateful object stops gradient. The guide gives this piece of code

x0 = tf.Variable(3.0)
x1 = tf.Variable(0.0)

with tf.GradientTape() as tape:
  # Update x1 = x1 + x0.
  x1.assign_add(x0)
  # The tape starts recording from x1.
  y = x1**2   # y = (x1 + x0)**2

# This doesn't work.
print(tape.gradient(y, x0))   #dy/dx0 = 2*(x1 + x0)

Why the gradient doesn't record x0?! Is it this function .assign_add(x0) that increments x1 overshadow x0? Is it because assign_add will pick the value of x0 and steal its allocated memory? Is it the right reason or there is another reason that I don't see?

Thank you in advance for your answers.

matebende
  • 543
  • 1
  • 7
  • 21
Valkar83
  • 13
  • 4

1 Answers1

0

Thinking of it as x1 = tf.Variable(0.0) has one state i.e. one memory location where values are stored. Then x1 = tf.assign_add(x0) changes these stored values in x1. However, it does not "attach" new pointers for memory location of x0. Hence a trace back on the gradient tape stops at x1 because it has no further memory address for x0 which was used to update it. Check out this.

Wikipedia: "...a computer program stores data in variables, which represent storage locations in the computer's memory. The contents of these memory locations, at any given point in the program's execution, is called the program's state."

Code below shows two cases:

  • x1.assign_add(x0) where tape cannot reach x0.
  • x1 = x0 where tape can reach x0.

Try:

# Variable
x0 = tf.Variable(3.0)
x1 = tf.Variable(0.0)

# Record
with tf.GradientTape() as tape:

  # Update x1 = x1 + x0
  #x1.assign_add(x0) #<-- Traceback NOT possible
  x1 = x0 #<-- Trace back IS possible

  # Tape starts recording from x0
  #y = (x1 + x0)**2 
  y = x1**2  


# Gradient
grad_y_x0 = tape.gradient(y, x0)
print('grad_y_x0:', grad_y_x0)

Output:

grad_y_x0: tf.Tensor(6.0, shape=(), dtype=float32)
Nilesh Ingle
  • 1,777
  • 11
  • 17