6

I've been playing around with the tf.gradients() function and came across a behavior I didn't expect. Namely it seems to be unable to calculate the gradient of a sliced Variable. I put together a example, that hopefully shows what I mean:

import tensorflow as tf

a = tf.Variable([1.0])
b = tf.Variable([1.0])
c = tf.concat(0, [a, b])
print(c)  # >Tensor("concat:0", shape=(2,), dtype=float32)

grad_full = tf.gradients(c,  c)
grad_slice1 = tf.gradients(c,  a)
grad_slice2 = tf.gradients(c,  c[:, ])  # --> Here the gradient is None
grad_slice3 = tf.gradients(c,  c[0, ])  # --> Here the gradient is None

print(grad_full)  # >[<tf.Tensor 'gradients/Fill:0' shape=(2,) dtype=float32>]
print(grad_slice1)  # >[<tf.Tensor 'gradients_1/concat_grad/Slice:0' shape=(1,) dtype=float32>]
print(grad_slice2)  # >[None]
print(grad_slice3)  # >[None]

sess = tf.Session()
sess.run(tf.initialize_all_variables())

grad_full_v, grad_slice_v = sess.run([grad_full[0], grad_slice1[0]])
print(grad_full_v)  # >[ 1.  1.]
print(grad_slice_v)  # >[ 1.]

My questions are:

1) Do I use the tf.gradients() function the way it is intended?

2) If so, is there a reason for this behavior? In my understanding slicing should not necessarily break the backpropagation.

3) Does that mean I need to avoid slicing within my entire network (or at least for every path from a Variable to the loss)? For example this would mean, that I must not slice the results of a fully connected layer into numerous meaningful parts (Like estimating multiple scalars with one fc layer and then slicing the joint estimation into parts I want to use).

I'm working with Tensorflow 0.11 RC0 build from source on Ubuntu 16 with python 3.5.

zimmermc
  • 623
  • 2
  • 6
  • 13
  • I have encountered the same problem. Still no answer? – TNg Apr 04 '20 at 06:48
  • Anyway, if it helps, I believe there is still no a "direct" solution to it (https://github.com/tensorflow/tensorflow/issues/834). Currently, there are at least two workaround ways: (1) Split the variable into a sliced one and the rest and apply stop_gradient to the rest (https://stackoverflow.com/questions/49048622/tensorflow-minimise-with-respect-to-only-some-elements-of-a-variable) or (2) define separate variable into a list (which is nice in my own problem). Gradient wrt to the sliced variable does not work and returns None as you've observed. – TNg Apr 04 '20 at 07:29

1 Answers1

0

d = c[:, ] creates a different tensor then a, b, c. If you consider dependency graph, d depends on c. then gradients doesn't work in this case. grad(y, x) works if x depends on y, not the other way around.

user1454804
  • 1,070
  • 7
  • 6