pieces of numerically identical code produce drastically different results

Question

It's related to: tensorflow modify variables in py_func (and its grad func)

I define my own op and its gradient in TensorFlow with this function.

    # define gradient of a python function
def py_func_with_grad(func, inp, Tout, stateful=True, name=None, grad=None): 
    num = []
    for i in range(100):
        num.append(str(np.random.randint(0,10)))
    rnd_name = 'PyFuncGrad' + ''.join(num)
    tf.RegisterGradient(rnd_name)(grad)
    g = tf.get_default_graph()
    with g.gradient_override_map({"PyFunc": rnd_name}):
        return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

I have a neural network that contains the following code snippet, where I have 5 numerically identical lines (I use one of them once). They produce drastically different results. I wonder if anybody has any clue. Thanks!!

For example, it's so weird that in (1) and (2) by merely replacing x with a TF variable (s_final) can make such a difference. I thought since they are numerically the same, there shouldn't be any difference.

s_final is a Tensorflow non-trainable variable.

    def _idenity_func(x,s): 
        return s
    def _dummy_grad(op,grad):
        return grad*0,grad*0

    assign_op_s_final_2 = s_final.assign(x)
    with tf.control_dependencies( [assign_op_s_final_2] ):
        x = tf.identity(x)

    x = tf.stop_gradient(x)

    # the three following lines should be numerically identical. since s_final has been assigned the value of x. but...
    # (1) use the following line, the network does not learn AT ALL!!
    x_revised = py_func_with_grad(_idenity_func, [x, s_final], [tf.float32], name=name, grad=lambda op,grad: _dummy_grad(op,grad) )
    # (2) use the following line, the network learns, even if x does not need any gradient (since there is tf.stop_gradient)
    # x_revised = py_func_with_grad(_idenity_func, [x, x], [tf.float32], name=name, grad=lambda op,grad: _dummy_grad(op,grad)) 
    # (3) use the following line, the network learns as well as (2)
    # x_revised = tf.stop_gradient(x) 
    # (4) use the following line, the network learns, but seems not as well as (2)  
    # x_revised = tf.stop_gradient(s_final)
    # (5) use the following line, the network does not learn AT ALL!!
    # x_revised = py_func_with_grad(_idenity_func, [x, tf.stop_gradient(s_final)], [tf.float32], name=name, grad=lambda op,grad: _dummy_grad(op,grad) )

Code is provided (requires tensorflow 0.12.1. Does not work with version >=1 because the implementation of HyperNetworks does not support tensorflow version >=1):

https://www.dropbox.com/s/58khyqdy3mtnri7/tensorflow_clean_ver01.zip?dl=0

The above lines are in the code we provide. Change them and run the model to see difference. Let me know any question about the code.

You can install tensorflow 0.12.1 to a temporary folder:

export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl
pip install --target=$HOME/tensorflow_versions/tf-0.12.1  --upgrade $TF_BINARY_URL

Then the path is added when you run the provided code. I use this approach to have multiple versions of Tensorflow on my computer.

Can you provide a working example so that we can make our own tests easily? Otherwise it is too much work. — onur güngör, May 19 '17 at 11:36
It is too much for me to install that specific version of Tensorflow, can you isolate the problem and share again? By doing that, you could run your code to see whether it is related with the Tensorflow version or not. — onur güngör, May 21 '17 at 21:50
@onurgüngör it's hard to isolate this problem because it's related to how well the network trains. There's a clean way to install Tensorflow 0.12.1 in a temporary folder. Then the path is added when you run the provided code. I use this approach to have multiple versions of Tensorflow on my computer. — DataHungry, May 21 '17 at 21:57
@DataHungry why don't you update ur code/project to use the most recent code of tensorflow? By not providing the most recent versions of tensorflow etc thing like that it makes it harder for people to help. The more you help us by lowering the barriers to entry to help you, the easier we can try to help. — Charlie Parker, May 22 '17 at 21:08

score 0 · Answer 1 · answered May 19 '17 at 20:19

0

Works fine in my experiments: I added code that uses x_revised and looked at the values of gradients with respect to other variables involved. The mistake must be in the code that's not posted.

answered May 19 '17 at 20:19

MWB

11,740
6
46
91

Code is provided – DataHungry May 21 '17 at 21:21

score 0 · Answer 2 · answered Jan 06 '18 at 00:50

0

My guess is that the assign is not really executed. Note that you are just building the graph, nothing is executed yet (not like pytorch)...

answered Jan 06 '18 at 00:50

Jianmin

1

pieces of numerically identical code produce drastically different results

2 Answers2

Linked