0

I have a custom model that takes an arbitrary "hidden model" as an input and wraps it in another tensor that treats the output of the hidden model as a return and computes the implied output by adding 1 and multiplying it by the original data:

class Model(tf.keras.Model):
    def __init__(self, hidden_model):
        super(Model, self).__init__(name='')
        self.hidden_model = hidden_model

    def build(
        self,
        reference_price_shape,
        hidden_inputs_shape):

        super(Model, self).build([reference_price_shape, hidden_inputs_shape])

    def call(self, inputs):
        reference_prices = inputs[0]
        hidden_layers_input = inputs[1]
        hidden_output = self.hidden_model(hidden_layers_input)
        return (hidden_output + 1) * reference_prices

    def compute_output_shape(self, input_shape):
        return (input_shape[0][0], 1)

However, I'd now like to know how sensitive the model is to changes in each of the inputs. To do this I thought I'd be able to use the keras.backend.gradients:

rows = 10
cols = 2

hidden_model = tf.keras.Sequential()

hidden_model.add(
    tf.keras.layers.Dense(
        1,
        name='output',
        use_bias=True,
        kernel_initializer=tf.constant_initializer(0.1),
        bias_initializer=tf.constant_initializer(0)))

model = Model(hidden_model)
model.build(
    reference_price_shape=(rows,),
    hidden_inputs_shape=(rows, cols))

from tensorflow.keras import backend as K
grads = K.gradients(model.output, model.input)

However, this returns an error:

--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) in 1 from tensorflow import keras 2 from tensorflow.keras import backend as K ----> 3 K.gradients(hidden_model.output, hidden_model.input)

/usr/lib64/python3.6/site-packages/tensorflow_core/python/keras/backend.py in gradients(loss, variables) 3795 """ 3796 return gradients_module.gradients( -> 3797 loss, variables, colocate_gradients_with_ops=True) 3798 3799

/usr/lib64/python3.6/site-packages/tensorflow_core/python/ops/gradients_impl.py in gradients(ys, xs, grad_ys, name, colocate_gradients_with_ops, gate_gradients, aggregation_method, stop_gradients, unconnected_gradients) 156 ys, xs, grad_ys, name, colocate_gradients_with_ops, 157 gate_gradients, aggregation_method, stop_gradients, --> 158 unconnected_gradients) 159 # pylint: enable=protected-access 160

/usr/lib64/python3.6/site-packages/tensorflow_core/python/ops/gradients_util.py in _GradientsHelper(ys, xs, grad_ys, name, colocate_gradients_with_ops, gate_gradients, aggregation_method, stop_gradients, unconnected_gradients, src_graph) 503 """Implementation of gradients().""" 504 if context.executing_eagerly(): --> 505 raise RuntimeError("tf.gradients is not supported when eager execution " 506 "is enabled. Use tf.GradientTape instead.") 507 if src_graph is None:

RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.

I had a look at the guide for tf.GradientTape, based on which I tried to add the following to my code:

with tf.GradientTape() as g:
  g.watch(x)

But where do I put this? x is a tensor, and I don't have an input tensor. I just have inputs, which is an array of numpy arrays.

Just to add to the confusion, there's a github post here that seems to suggest this is tensorflow 2.0 bug, and that adding tf.compat.v1.disable_eager_execution() will resolve the issue for me. It didn't (although it did get the above error to change to Layer model_1 has no inbound nodes. - not sure if that's a step forwards or backwards).

Sorry I realise this question is bordering on untenable, but at this point I'm really confused and this is probably the best I can do at framing it as something answerable.

As a test I tried running K.gradients with hidden_model instead, which kind of worked:

enter image description here

But I don't know what to do with this, as I usually run my model using model.predict(input_data) - how am I supposed to get the local derivatives using that tensor?

So I think I have two problems:

  1. How do I calculate the derivative of my output with respect to my input for the whole model - it's tensors all the way through so Keras/tensorflow really should be able to apply the chain rule even with my custom call() function/model.
  2. Once I have the derivative tensor, what do I do with it?

I initially thought I should try to separate these questions, but either of them asked alone might be an XY problem so I thought I'd ask them together to give the answerers some context.

quant
  • 21,507
  • 32
  • 115
  • 211
  • Any particular reason you're using `tf.keras` for model and `keras` for backend? They are not always compatible – thushv89 Jan 04 '20 at 07:54
  • 1
    @thushv89 no this is not deliberate - thanks for the advice. I've changed it to `tf.keras.backend.gradients` which has led me to a new error - see updated question! – quant Jan 04 '20 at 08:08

1 Answers1

1

It is possible but requires some work (apparently). Would love to see a more elegant solution. But this is as better as it got for me.

import tensorflow as tf
from tensorflow.keras import backend as K
import numpy as np

rows = 10
cols = 2

with tf.Graph().as_default():


  hidden_model = tf.keras.Sequential()

  hidden_model.add(
      tf.keras.layers.Dense(
          1,
          name='output',
          use_bias=True,
          kernel_initializer=tf.constant_initializer(0.1),
          bias_initializer=tf.constant_initializer(0)))

  model = Model(hidden_model)
  model.build(
      reference_price_shape=(rows,),
      hidden_inputs_shape=(rows, cols))

Note that, model building needs to happen in the same graph you try to get the gradients within. Probably doesn't need to be the default graph, but the same graph.

Then within the same context of the graph, create a gradient tape context. Also note that x needs to be a tf.Variable() in order to register as an input to a gradient.

  with tf.GradientTape() as tape:
    x = tf.Variable(np.random.normal(size=(10, rows, cols)), dtype=tf.float32)
    out = model(x)

With that you can get the gradients.

  grads = tape.gradient(out, x)

  sess = tf.compat.v1.keras.backend.get_session()
  sess.run(tf.compat.v1.global_variables_initializer())
  g = sess.run(grads)
  print(g)
thushv89
  • 10,865
  • 1
  • 26
  • 39
  • I don't really know what to do with this. What does tf.Graph().as_default() do? The documentation says it's deprecated. Also, what is the purpose of the session object, and how do I run my model once I evaluate the gradient? – quant Jan 04 '20 at 12:17
  • I tried running `tape.gradient(out, x)` but it returned `NoneType`. – quant Jan 04 '20 at 12:19
  • Maybe I should be asking a different question - surely this is not usually so complicated. What's the "normal" way of getting a derivative from a `keras` model? – quant Jan 04 '20 at 12:20
  • I'm going to mark this as the answer, as it's probably as good a response as you could give. However, I've started a new question that removes the added complexity of the "custom" model from the equation, hopefully attracting a more elegant solution. – quant Jan 04 '20 at 12:51
  • https://stackoverflow.com/questions/59590766/how-do-i-get-the-gradient-of-a-keras-model-with-respect-to-its-inputs?noredirect=1#comment105347093_59590766 – quant Jan 04 '20 at 12:52