Tensorflow's gradient_override_map function

Question

Can someone explain me gradient_override_map function in TensorFlow? I couldn't understand its usage precisely.

I see code usage as:

with G.gradient_override_map({"Floor": "Identity"}):
    return tf.reduce_mean(SomeVals) * SomeOtherVal

What exactly is happening here? What is Identity?

Just for clarification purposes, the naming "Identity" of the operation does not matter. It is changing the gradient op of all identity ops in the block — mshlis, May 06 '19 at 14:21

Yilin He · Answer 1 · 2019-08-19T03:29:08.693

Both "Floor" and "Identity" are type strings of operations, the former is corresponding to tf.floor while the latter tf.identity. So the function of your code, I guess, is to substitute tf.identity's back-propagated gradient(BPG for short) calculation mechanism for BPG calculation mechanism of tf.floor operations within graph G while passing forward output of tf.reduce_mean. It seems a little weird since in all applications of gradient_override_map I've found so far, the key of op_type_map is always identical to the type string of the operation used to produce an output in the context. By this I mean I'm more familiar with scenarios with tf.floor(SomeVals) returned, instead of tf.reduce_mean(SomeVals).

What gradient_override_map({op_A_type: op_B_type}) does is to replace op_A's BPG calculation mechanism with op_B's while remaining op_A_type's forward propagation calculation mechanism. A common application of gradient_override_map is shown in lahwran's answer.

@tf.RegisterGradient("CustomGrad")
def _const_mul_grad(unused_op, grad):
    return 5.0 * grad

g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "CustomGrad"}):
    output = tf.identity(input, name="Identity")

by

@tf.RegisterGradient("CustomGrad")
def _const_mul_grad(unused_op, grad):
    return 5.0 * grad

the decorator, tf.RegisterGradient("CustomGrad") registers the gradient function defined by _const_mul_grad(unused_op, grad) for a customized op type -- "CustomGrad",

while

g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "CustomGrad"}):
    output = tf.identity(input, name="Identity")

assures outputs of all operations (in graph g) with string type "Identity" (tf.identity) are as they were whereas BPG calculation mechanism of tf.identitys replaced by BPG calculation mechanism of operation with string type "CustomGrad".

P.S.

The type string of an op corresponds to the OpDef.name field for the proto that defines the operation. To find an op's OpDef.name , please refer to MingXing's answer under this question
It is not necessary to declare the name of tf.identity operation since the arg 'name' in tf.identity is optional.

I know it’s just an example, but isn’t it a bit dangerous to change the gradient of the identity function? I mean, the identity function is the function that you should be able to apply without causing any changes, so it can in principle have been used an arbitrary number of times in any TF function since it doesn’t change anything, right? Changing its gradient makes it suddenly have an effect, which feels like it may give rise to unexpected consequences. — HelloGoodbye, Jun 25 '19 at 21:11
@HelloGoodbye your worry is reasonable, the identity function in computation graph g has indeed been manually edited. However, it is not beyond our control since we have every knowledge of the Opt: the forward function (remaining unchanged), the backward gradient (as we defined). All we need is to deal with the changed Opt with extra caution. — Yilin He, Aug 07 '19 at 16:48

lahwran · Answer 2 · 2017-11-01T23:22:03.980

As best as I can tell, gradient_override_map allows you to say "in this context, any time you would use the gradient of X, instead use the gradient of Y". which means you still need the gradient of Y to be the gradient you want to use.

This is an example I've seen floating around while looking for how this works:

@tf.RegisterGradient("CustomGrad")
def _const_mul_grad(unused_op, grad):
    return 5.0 * grad

g = tf.get_default_graph()
with g.gradient_override_map({"Identity": "CustomGrad"}):
    output = tf.identity(input, name="Identity")

cite: https://stackoverflow.com/a/43948872/1102705

RegisterGradient() allows you to register the gradient of a new op you're defining, thereby allowing you to have an op that has the gradient you wanted, and then you can use that op in the gradient override map. It's kind of clunky - you're defining an op with no forward pass.

Something I'm not clear on is whether the name="Identity" is actually necessary.

Tensorflow's gradient_override_map function

2 Answers2