0

In short: I have a custom loss layer in Tensorflow/Keras 2+, which implements a loss function involving two variables, which also go through minimization. And it works, as can be seen below. I wish to track the loss gradients with respect to these two variables. Using GradientTape.gradient() seems to work judging from tf.print() output. But I have no idea how to keep the actual values.

In detail:

Suppose this is my custom loss layer (yes, the loss function is silly, everything is over-simplified for reproducibility):

import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Layer
from tensorflow.keras.callbacks import EarlyStopping, Callback
import tensorflow.keras.backend as K
from tensorflow.keras import Model

class MyLoss(Layer):
    def __init__(self, var1, var2):
        super(MyLoss, self).__init__()
        self.var1 = K.variable(var1) # or tf.Variable(var1) etc.
        self.var2 = K.variable(var2)
    
    def get_vars(self):
        return self.var1, self.var2
    
    def get_gradients(self):
        return self.grads

    def custom_loss(self, y_true, y_pred):
        loss = self.var1 * K.mean(K.square(y_true-y_pred)) + self.var2 ** 2
        return loss

    def compute_gradients(self, y_true, y_pred):
        with tf.GradientTape() as g:
          loss = self.custom_loss(y_true, y_pred)
          return loss, g.gradient(loss, [self.var1, self.var2])
    
    def call(self, y_true, y_pred):
        loss, grads = self.compute_gradients(y_true, y_pred)
        self.grads = grads
        # tf.print(grads)
        self.add_loss(loss)
        return y_pred

Suppose these are my data and Model (yes, y enters the model as an additional input, this works and isn't related):

n_col = 10
n_row = 1000
X = np.random.normal(size=(n_row, n_col))
beta = np.arange(10)
y = X @ beta

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

inputs = Input(shape=(X_train.shape[1],))
y_input = Input(shape=(1,))
hidden1 = Dense(10)(inputs)
output = Dense(1)(hidden1)
my_loss = MyLoss(0.5, 0.5)(y_input, output) # here can also initialize those var1, var2
model = Model(inputs=[inputs, y_input], outputs=my_loss)

model.compile(optimizer= 'adam')

Now the model and loss work, as evident by the variables profile, e.g. by keeping the variables after each epoch (their values also make sense if you check the silly loss):

var1_list = []
var2_list = []
for i in range(100):
    if i % 10 == 0:
        print('step %d' % i)
    model.fit([X_train, y_train], None,
              batch_size=32, epochs=1, validation_split=0.1, verbose=0)
    var1, var2 = model.layers[-1].get_vars()
    var1_list.append(var1.numpy())
    var2_list.append(var2.numpy())

plt.plot(var1_list, label='var1')
plt.plot(var2_list, 'r', label='var2')
plt.legend()
plt.show()

enter image description here

But when I wish to observe/keep the gradient I get a list of (empty?) Tensors:

grads = model.layers[-1].get_gradients()
grads

ListWrapper([<tf.Tensor 'gradient_tape/model/my_loss/mul/Mul:0' shape=() dtype=float32>, <tf.Tensor 'gradient_tape/model/my_loss/pow/mul_1:0' shape=() dtype=float32>])

No point in calling numpy() over these of course:

grads[0].numpy()

AttributeError: 'Tensor' object has no attribute 'numpy'

However. Something is obviously right here, since when I use tf.print(grads) to print the gradients while training (uncomment the tf.print(grads) inside the call() function above), the gradients values are printed and they also make sense:

[226.651245, 1]
[293.38916, 0.998]
[263.979889, 0.996000171]
[240.448029, 0.994000435]
[337.309021, 0.992001]
[286.644775, 0.990001857]
[194.823975, 0.988003075]
[173.756546, 0.98600477]
[267.330505, 0.984007]
[139.302826, 0.982009768]
[310.315216, 0.980013192]
[263.746216, 0.97801733]
[267.713, 0.976022303]
[291.754578, 0.974028111]
[376.523895, 0.972034812]
[474.974884, 0.970042467]
[375.520294, 0.968051136]
etc. etc.

Note there is no need to add g.watch([self.var1, self.var2]), though adding it doesn't change the issue.

How do I keep track of those gradients (like I keep track of var1 and var2)? What does tf.print() "see" that I can't see?

Giora Simchoni
  • 3,487
  • 3
  • 34
  • 72

1 Answers1

1

Following this answer, it seems once you go manual like I did TF might turn off eager execution. The solution is to add run_eagerly=True in the model.compile() line above:

model.compile(optimizer= 'adam', run_eagerly=True)

Then I'm able to call .numpy() on my grads tensors with no problem, e.g.:

grad1_list = []
grad2_list = []
for i in range(100):
    if i % 10 == 0:
        print('step %d' % i)
    model.fit([X_train, y_train], None,
              batch_size=32, epochs=1, validation_split=0.1, verbose=0)
    grad1, grad2 = model.layers[-1].get_gradients()
    grad1_list.append(grad1.numpy())
    grad2_list.append(grad2.numpy())

plt.plot(grad1_list, label='grad1')
plt.plot(grad2_list, 'r', label='grad2')
plt.legend()
plt.show()

enter image description here

Giora Simchoni
  • 3,487
  • 3
  • 34
  • 72