I'm trying to differentiate my loss function with respect to the model output in the training_step
function of a tf.keras.Model
. This is my attempt:
def train_step(self, data):
x, y = data
with tf.GradientTape(persistent=True) as tape1:
y_pred, dense_out = self(x, training=True)
with tf.GradientTape() as tape2:
tape2.watch(y_pred)
loss = self.compiled_loss(y, y_pred)
dy = tape2.gradient(loss, y_pred)
I would use tape1
for gradients I need later. First of all, dy
gives None
, how can I fix this? Secondly, is it allowed to return two output values when training=True
in the call
-method of my model and only one when training=False
? Even if I don't do this dy
is None
?
Edit: If I do the following outside the train_step
function, It does give a result different from None
:
with tf.GradientTape() as tape:
y_pred, dense_out = model(x_train, training=True)
loss = loss_fn(y_train, y_pred)
print(tape.gradient(loss, y_pred))