1

In tensorflow, we can define our own op and its gradient by: https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342

However, can we modify any variable in the computational graph in these python functions. For example in the "_MySquareGrad" function?

I assume we can get the variable by:

var  = tf.get_variable('var')

and then do something to change its value and then assign it back? e.g.

tmp = var*10
var.assign(tmp)

Thanks!

Also when we do var*10, do we have to convert it to numpy?

Background: I'm familiar with automatic differentiation, but new to Tensorflow and Python. So please point out any syntactic problem and let me know if my intention is clear.

DataHungry
  • 351
  • 2
  • 9
  • I would recommend you to take a look at: http://stackoverflow.com/questions/42870727/can-one-only-implement-gradient-descent-like-optimizers-with-the-code-example-fr, basically just code ur update rule in tensorflow entirely. – Charlie Parker Apr 02 '17 at 16:17
  • Here's a following-up question: https://stackoverflow.com/questions/43462376/pieces-of-numerically-identical-code-produce-drastically-different-results – DataHungry Jun 12 '17 at 04:24

1 Answers1

2

You can modify the variables in the computational graph in these python functions. Your example code with tmp = var*10 will work and does not convert anything to numpy.

In fact you should try to avoid converting to numpy as much as possible since it will slow down the computation.

edit:

You can include your code to the gradient computation graph of the _MySquareGrad function doing this:

def _MySquareGrad(op, grad):

  #first get a Variable that was created using tf.get_variable()
  with tf.variable_scope("", reuse=True):
    var  = tf.get_variable('var')

  #now create the assign graph:
  tmp = var*10.
  assign_op = var.assign(tmp)

  #now make the assign operation part of the grad calculation graph:
  with tf.control_dependencies([assign_op]):
    x = tf.identity(op.inputs[0])

  return grad * 20 * x

Here is a working example:

import tensorflow as tf
from tensorflow.python.framework import ops
import numpy as np

# Define custom py_func which takes also a grad op as argument:
def py_func(func, inp, Tout, stateful=True, name=None, grad=None):

    # Need to generate a unique name to avoid duplicates:
    rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))

    tf.RegisterGradient(rnd_name)(grad)  # see _MySquareGrad for grad example
    g = tf.get_default_graph()
    with g.gradient_override_map({"PyFunc": rnd_name}):
        return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

# Def custom square function using np.square instead of tf.square:
def mysquare(x, name=None):

    with ops.name_scope(name, "Mysquare", [x]) as name:
        sqr_x = py_func(np.square,
                        [x],
                        [tf.float32],
                        name=name,
                        grad=_MySquareGrad)  # <-- here's the call to the gradient
        return sqr_x[0]

### Actual gradient:
##def _MySquareGrad(op, grad):
    ##x = op.inputs[0]
    ##return grad * 20 * x  # add a "small" error just to see the difference:


def _MySquareGrad(op, grad):

  #first get a Variable that was created using tf.get_variable()
  with tf.variable_scope("", reuse=True):
    var  = tf.get_variable('var')

  #now create the assign graph:
  tmp = var*10.
  assign_op = var.assign(tmp)

  #now make the assign operation part of the grad calculation graph:
  with tf.control_dependencies([assign_op]):
    x = tf.identity(op.inputs[0])

  return grad * 20 * x



with tf.Session() as sess:
    x = tf.constant([1., 2.])

    var = tf.get_variable(name="var", shape=[], initializer=tf.constant_initializer(0.2))

    y = mysquare(x)
    tf.global_variables_initializer().run()

    print(x.eval(), y.eval(), tf.gradients(y, x)[0].eval())
    print("Now var is 10 times larger:", var.eval())
BlueSun
  • 3,541
  • 1
  • 18
  • 37
  • Thank you so much! This is very interesting. I thought tmp = var*10, if carried out by tensorflow, was a "mini computational graph". So I wasn't sure if we can embed this "mini computation graph" in the backpropagtion of a larger computational graph. – DataHungry Apr 02 '17 at 16:55
  • One question, when we perform this "var.assign(tmp)" in python. Does it take effect immediately? I thought tensorflow only constructs computational graph and waits until a sess.run() to evaluate the operation. since we are already inside the backprop execution of a larger graph, are we able to perform this "var.assign(tmp)" immediately? I'm confused because it's a mini computational graph within a larger graph. – DataHungry Apr 03 '17 at 13:24
  • @user21707 No it does not take place immediately. Just as you say it only constructs a computational graph that will be executed once sess.run() is used on this graph, or on a larger graph that includes the small graph. I added an example showing how to add your computational graph to the larger gradient-computation graph. – BlueSun Apr 03 '17 at 17:14
  • Thank you so much! I solved this issue with your suggestions. – DataHungry Apr 18 '17 at 02:08
  • However, I have do encounter another issue. http://stackoverflow.com/questions/43462376/pieces-of-numerically-identical-code-produce-drastically-different-results – DataHungry Apr 18 '17 at 02:08