Tensorflow: How to write op with gradient in python?

Question

I would like to write a TensorFlow op in python, but I would like it to be differentiable (to be able to compute a gradient).

This question asks how to write an op in python, and the answer suggests using py_func (which has no gradient): Tensorflow: Writing an Op in Python

The TF documentation describes how to add an op starting from C++ code only: https://www.tensorflow.org/versions/r0.10/how_tos/adding_an_op/index.html

In my case, I am prototyping so I don't care about whether it runs on GPU, and I don't care about it being usable from anything other than the TF python API.

score 13 · Accepted Answer · edited Jun 29 '18 at 23:12

Yes, as mentionned in @Yaroslav's answer, it is possible and the key is the links he references: here and here. I want to elaborate on this answer by giving a concret example.

Modulo opperation: Let's implement the element-wise modulo operation in tensorflow (it already exists but its gradient is not defined, but for the example we will implement it from scratch).

Numpy function: The first step is to define the opperation we want for numpy arrays. The element-wise modulo opperation is already implemented in numpy so it is easy:

import numpy as np
def np_mod(x,y):
    return (x % y).astype(np.float32)

The reason for the .astype(np.float32) is because by default tensorflow takes float32 types and if you give it float64 (the numpy default) it will complain.

Gradient Function: Next we need to define the gradient function for our opperation for each input of the opperation as tensorflow function. The function needs to take a very specific form. It need to take the tensorflow representation of the opperation op and the gradient of the output grad and say how to propagate the gradients. In our case, the gradients of the mod opperation are easy, the derivative is 1 with respect to the first argument and with respect to the second (almost everywhere, and infinite at a finite number of spots, but let's ignore that, see https://math.stackexchange.com/questions/1849280/derivative-of-remainder-function-wrt-denominator for details). So we have

def modgrad(op, grad):
    x = op.inputs[0] # the first argument (normally you need those to calculate the gradient, like the gradient of x^2 is 2x. )
    y = op.inputs[1] # the second argument

    return grad * 1, grad * tf.neg(tf.floordiv(x, y)) #the propagated gradient with respect to the first and second argument respectively

The grad function needs to return an n-tuple where n is the number of arguments of the operation. Notice that we need to return tensorflow functions of the input.

Making a TF function with gradients: As explained in the sources mentioned above, there is a hack to define gradients of a function using tf.RegisterGradient [doc] and tf.Graph.gradient_override_map [doc].

Copying the code from harpone we can modify the tf.py_func function to make it define the gradient at the same time:

import tensorflow as tf

def py_func(func, inp, Tout, stateful=True, name=None, grad=None):

    # Need to generate a unique name to avoid duplicates:
    rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))

    tf.RegisterGradient(rnd_name)(grad)  # see _MySquareGrad for grad example
    g = tf.get_default_graph()
    with g.gradient_override_map({"PyFunc": rnd_name}):
        return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

The stateful option is to tell tensorflow whether the function always gives the same output for the same input (stateful = False) in which case tensorflow can simply the tensorflow graph, this is our case and will probably be the case in most situations.

Combining it all together: Now that we have all the pieces, we can combine them all together:

from tensorflow.python.framework import ops

def tf_mod(x,y, name=None):

    with ops.op_scope([x,y], name, "mod") as name:
        z = py_func(np_mod,
                        [x,y],
                        [tf.float32],
                        name=name,
                        grad=modgrad)  # <-- here's the call to the gradient
        return z[0]

tf.py_func acts on lists of tensors (and returns a list of tensors), that is why we have [x,y] (and return z[0]). And now we are done. And we can test it.

Test:

with tf.Session() as sess:

    x = tf.constant([0.3,0.7,1.2,1.7])
    y = tf.constant([0.2,0.5,1.0,2.9])
    z = tf_mod(x,y)
    gr = tf.gradients(z, [x,y])
    tf.initialize_all_variables().run()

    print(x.eval(), y.eval(),z.eval(), gr[0].eval(), gr[1].eval())

[ 0.30000001 0.69999999 1.20000005 1.70000005] [ 0.2 0.5 1. 2.9000001] [ 0.10000001 0.19999999 0.20000005 1.70000005] [ 1. 1. 1. 1.] [ -1. -1. -1. 0.]

Success!

Thanks a lot for this post! Do you have how to tell Tensorflow the shape of `z`? `x` has shape (4), `y` has shape (4), but Tensorflow does not know that `z` has shape (4). Only at runtime, it will resolve that the shape is 4. — Philippe Remy, Dec 16 '16 at 13:30
`z[0] = tf.reshape(z[0], [int(x.get_shape()[0])])` I used this to enforce it. — Philippe Remy, Dec 16 '16 at 13:37
Are you sure your gradients are correct? Thanks!http://math.stackexchange.com/questions/1849280/derivative-of-remainder-function-wrt-denominator — Philippe Remy, Dec 16 '16 at 18:10
@PhilippeRemy, you don't need to tell Tensorflow the shape of `z` it deduces it automatically from `x` and the operator. — patapouf_ai, Dec 19 '16 at 14:57
@PhilippeRemy, yes, good point about the gradients, I ignored the gradient with respect to `y` because I didnt care about it for my example, but you are right that that is not correct. I will edit my answer to reflect that. — patapouf_ai, Dec 19 '16 at 14:58
Great answer. Just as a side note, `tf.py_func` cannot be used if you plan to serialize your graph (e.g. to run it in C++ later). In that case, you can still define an op an gradient, but it's just more complicated, since you have to do it in C++. There's a bit more of info in [this related answer](https://stackoverflow.com/a/43283357/1782792). — jdehesa, May 23 '17 at 10:32
@patapouf_ai thanks for the great answer. I also need the numpy version of input in the grad function, so I call `op.inputs[0].eval()` in `modgrad`. So my question is that is it possible to implement everything in numpy in grad function as well? — gkcn, Aug 08 '17 at 19:17
Ah ok, I should wrap the grad function into `tf.py_func` as well. — gkcn, Aug 08 '17 at 21:09

score 4 · Answer 2 · answered Aug 19 '16 at 23:47

4

Here's an example of adding gradient to a specific py_func https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342

Here's the issue discussion

answered Aug 19 '16 at 23:47

Yaroslav Bulatov

57,332
22
139
197

Thanks, that gist does answer the question. Mainly it is to call tf.RegisterGradient() and then gradient_override_map(). As they guys at the issue mention that is a terribly hacky way because it relies on giving the function a name, but it seems the only way right now. Thanks again! – Alex I Sep 05 '16 at 07:24

Tensorflow: How to write op with gradient in python?

2 Answers2

Linked