In short: I have a custom loss layer in Tensorflow/Keras 2+, which implements a loss function involving two variables, which also go through minimization. And it works, as can be seen below. I wish to track the loss gradients with respect to these two variables. Using GradientTape.gradient()
seems to work judging from tf.print()
output. But I have no idea how to keep the actual values.
In detail:
Suppose this is my custom loss layer (yes, the loss function is silly, everything is over-simplified for reproducibility):
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Layer
from tensorflow.keras.callbacks import EarlyStopping, Callback
import tensorflow.keras.backend as K
from tensorflow.keras import Model
class MyLoss(Layer):
def __init__(self, var1, var2):
super(MyLoss, self).__init__()
self.var1 = K.variable(var1) # or tf.Variable(var1) etc.
self.var2 = K.variable(var2)
def get_vars(self):
return self.var1, self.var2
def get_gradients(self):
return self.grads
def custom_loss(self, y_true, y_pred):
loss = self.var1 * K.mean(K.square(y_true-y_pred)) + self.var2 ** 2
return loss
def compute_gradients(self, y_true, y_pred):
with tf.GradientTape() as g:
loss = self.custom_loss(y_true, y_pred)
return loss, g.gradient(loss, [self.var1, self.var2])
def call(self, y_true, y_pred):
loss, grads = self.compute_gradients(y_true, y_pred)
self.grads = grads
# tf.print(grads)
self.add_loss(loss)
return y_pred
Suppose these are my data and Model
(yes, y
enters the model as an additional input, this works and isn't related):
n_col = 10
n_row = 1000
X = np.random.normal(size=(n_row, n_col))
beta = np.arange(10)
y = X @ beta
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
inputs = Input(shape=(X_train.shape[1],))
y_input = Input(shape=(1,))
hidden1 = Dense(10)(inputs)
output = Dense(1)(hidden1)
my_loss = MyLoss(0.5, 0.5)(y_input, output) # here can also initialize those var1, var2
model = Model(inputs=[inputs, y_input], outputs=my_loss)
model.compile(optimizer= 'adam')
Now the model and loss work, as evident by the variables profile, e.g. by keeping the variables after each epoch (their values also make sense if you check the silly loss):
var1_list = []
var2_list = []
for i in range(100):
if i % 10 == 0:
print('step %d' % i)
model.fit([X_train, y_train], None,
batch_size=32, epochs=1, validation_split=0.1, verbose=0)
var1, var2 = model.layers[-1].get_vars()
var1_list.append(var1.numpy())
var2_list.append(var2.numpy())
plt.plot(var1_list, label='var1')
plt.plot(var2_list, 'r', label='var2')
plt.legend()
plt.show()
But when I wish to observe/keep the gradient I get a list of (empty?) Tensors:
grads = model.layers[-1].get_gradients()
grads
ListWrapper([<tf.Tensor 'gradient_tape/model/my_loss/mul/Mul:0' shape=() dtype=float32>, <tf.Tensor 'gradient_tape/model/my_loss/pow/mul_1:0' shape=() dtype=float32>])
No point in calling numpy()
over these of course:
grads[0].numpy()
AttributeError: 'Tensor' object has no attribute 'numpy'
However. Something is obviously right here, since when I use tf.print(grads)
to print the gradients while training (uncomment the tf.print(grads)
inside the call()
function above), the gradients values are printed and they also make sense:
[226.651245, 1] [293.38916, 0.998] [263.979889, 0.996000171] [240.448029, 0.994000435] [337.309021, 0.992001] [286.644775, 0.990001857] [194.823975, 0.988003075] [173.756546, 0.98600477] [267.330505, 0.984007] [139.302826, 0.982009768] [310.315216, 0.980013192] [263.746216, 0.97801733] [267.713, 0.976022303] [291.754578, 0.974028111] [376.523895, 0.972034812] [474.974884, 0.970042467] [375.520294, 0.968051136] etc. etc.
Note there is no need to add g.watch([self.var1, self.var2])
, though adding it doesn't change the issue.
How do I keep track of those gradients (like I keep track of var1
and var2
)? What does tf.print()
"see" that I can't see?