I've been playing around with the tf.gradients() function and came across a behavior I didn't expect. Namely it seems to be unable to calculate the gradient of a sliced Variable. I put together a example, that hopefully shows what I mean:
import tensorflow as tf
a = tf.Variable([1.0])
b = tf.Variable([1.0])
c = tf.concat(0, [a, b])
print(c) # >Tensor("concat:0", shape=(2,), dtype=float32)
grad_full = tf.gradients(c, c)
grad_slice1 = tf.gradients(c, a)
grad_slice2 = tf.gradients(c, c[:, ]) # --> Here the gradient is None
grad_slice3 = tf.gradients(c, c[0, ]) # --> Here the gradient is None
print(grad_full) # >[<tf.Tensor 'gradients/Fill:0' shape=(2,) dtype=float32>]
print(grad_slice1) # >[<tf.Tensor 'gradients_1/concat_grad/Slice:0' shape=(1,) dtype=float32>]
print(grad_slice2) # >[None]
print(grad_slice3) # >[None]
sess = tf.Session()
sess.run(tf.initialize_all_variables())
grad_full_v, grad_slice_v = sess.run([grad_full[0], grad_slice1[0]])
print(grad_full_v) # >[ 1. 1.]
print(grad_slice_v) # >[ 1.]
My questions are:
1) Do I use the tf.gradients() function the way it is intended?
2) If so, is there a reason for this behavior? In my understanding slicing should not necessarily break the backpropagation.
3) Does that mean I need to avoid slicing within my entire network (or at least for every path from a Variable to the loss)? For example this would mean, that I must not slice the results of a fully connected layer into numerous meaningful parts (Like estimating multiple scalars with one fc layer and then slicing the joint estimation into parts I want to use).
I'm working with Tensorflow 0.11 RC0 build from source on Ubuntu 16 with python 3.5.