I have a custom model that takes an arbitrary "hidden model" as an input and wraps it in another tensor that treats the output of the hidden model as a return and computes the implied output by adding 1 and multiplying it by the original data:
class Model(tf.keras.Model):
def __init__(self, hidden_model):
super(Model, self).__init__(name='')
self.hidden_model = hidden_model
def build(
self,
reference_price_shape,
hidden_inputs_shape):
super(Model, self).build([reference_price_shape, hidden_inputs_shape])
def call(self, inputs):
reference_prices = inputs[0]
hidden_layers_input = inputs[1]
hidden_output = self.hidden_model(hidden_layers_input)
return (hidden_output + 1) * reference_prices
def compute_output_shape(self, input_shape):
return (input_shape[0][0], 1)
However, I'd now like to know how sensitive the model is to changes in each of the inputs. To do this I thought I'd be able to use the keras.backend.gradients
:
rows = 10
cols = 2
hidden_model = tf.keras.Sequential()
hidden_model.add(
tf.keras.layers.Dense(
1,
name='output',
use_bias=True,
kernel_initializer=tf.constant_initializer(0.1),
bias_initializer=tf.constant_initializer(0)))
model = Model(hidden_model)
model.build(
reference_price_shape=(rows,),
hidden_inputs_shape=(rows, cols))
from tensorflow.keras import backend as K
grads = K.gradients(model.output, model.input)
However, this returns an error:
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) in 1 from tensorflow import keras 2 from tensorflow.keras import backend as K ----> 3 K.gradients(hidden_model.output, hidden_model.input)
/usr/lib64/python3.6/site-packages/tensorflow_core/python/keras/backend.py in gradients(loss, variables) 3795 """ 3796 return gradients_module.gradients( -> 3797 loss, variables, colocate_gradients_with_ops=True) 3798 3799
/usr/lib64/python3.6/site-packages/tensorflow_core/python/ops/gradients_impl.py in gradients(ys, xs, grad_ys, name, colocate_gradients_with_ops, gate_gradients, aggregation_method, stop_gradients, unconnected_gradients) 156 ys, xs, grad_ys, name, colocate_gradients_with_ops, 157 gate_gradients, aggregation_method, stop_gradients, --> 158 unconnected_gradients) 159 # pylint: enable=protected-access 160
/usr/lib64/python3.6/site-packages/tensorflow_core/python/ops/gradients_util.py in _GradientsHelper(ys, xs, grad_ys, name, colocate_gradients_with_ops, gate_gradients, aggregation_method, stop_gradients, unconnected_gradients, src_graph) 503 """Implementation of gradients().""" 504 if context.executing_eagerly(): --> 505 raise RuntimeError("tf.gradients is not supported when eager execution " 506 "is enabled. Use tf.GradientTape instead.") 507 if src_graph is None:
RuntimeError: tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.
I had a look at the guide for tf.GradientTape, based on which I tried to add the following to my code:
with tf.GradientTape() as g:
g.watch(x)
But where do I put this? x
is a tensor, and I don't have an input tensor. I just have inputs
, which is an array of numpy arrays.
Just to add to the confusion, there's a github post here that seems to suggest this is tensorflow 2.0
bug, and that adding tf.compat.v1.disable_eager_execution()
will resolve the issue for me. It didn't (although it did get the above error to change to Layer model_1 has no inbound nodes.
- not sure if that's a step forwards or backwards).
Sorry I realise this question is bordering on untenable, but at this point I'm really confused and this is probably the best I can do at framing it as something answerable.
As a test I tried running K.gradients
with hidden_model
instead, which kind of worked:
But I don't know what to do with this, as I usually run my model using model.predict(input_data)
- how am I supposed to get the local derivatives using that tensor?
So I think I have two problems:
- How do I calculate the derivative of my output with respect to my input for the whole model - it's tensors all the way through so
Keras
/tensorflow
really should be able to apply the chain rule even with my customcall()
function/model. - Once I have the derivative tensor, what do I do with it?
I initially thought I should try to separate these questions, but either of them asked alone might be an XY problem so I thought I'd ask them together to give the answerers some context.