How is add_loss and compile's loss combined for the gradient calculation?

Question

You can specify the loss in a keras tensorflow model in two ways. You can use add_loss and you can also specify the loss in compile's loss argument. Since the gradient is taken with respect to some loss in order to do the weight updates, I would imagine that there needs to be a single function somehow combining those losses into one. Are they just added together?

For example, let's say I have the following model. The only important lines are self.add_loss(kl_loss) and autoencoder.compile(optimizer=optimizer, loss=r_loss, metrics=[r_loss]).

class Autoencoder(Model):
  def __init__(self):
    super(Autoencoder, self).__init__()
    encoder_input = layers.Input(shape=INPUT_SHAPE, name='encoder_input')
    x = encoder_input
    # ...
    x = layers.Flatten()(x)
    mu = layers.Dense(LATENT_DIM, name='mu')(x)
    log_var = layers.Dense(LATENT_DIM, name='log_var')(x)

    def sample(args):
      mu, log_var = args
      epsilon = tf.random.normal(shape=K.shape(mu), mean=0., stddev=1.)
      return mu + tf.math.exp(log_var / 2) * epsilon

    encoder_output = layers.Lambda(sample, name='encoder_output')([mu, log_var])
    self.encoder = Model(encoder_input, outputs=[encoder_output, mu, log_var])

    self.decoder = tf.keras.Sequential([
      layers.Input(shape=LATENT_DIM),
      # ...

  def call(self, x):
    encoded, mu, log_var = self.encoder(x)
    kl_loss = tf.math.reduce_mean(-0.5 * tf.math.reduce_sum(1 + log_var - tf.math.square(mu) - tf.math.exp(log_var)))
    self.add_loss(kl_loss)
    decoded = self.decoder(encoded)
    return decoded

def train_autoencoder():
  autoencoder = Autoencoder()

  def r_loss(y_true, y_pred):
    return tf.math.reduce_sum(tf.math.square(y_true - y_pred), axis=[1, 2, 3])

  optimizer = tf.keras.optimizers.Adam(1e-4)

  autoencoder.compile(optimizer=optimizer, loss=r_loss, metrics=[r_loss])

When I train my model, I see the following values:

Epoch 00001: saving model to models/autoencoder/cp-autoencoder.ckpt
1272/1272 [==============================] - 249s 191ms/step - batch: 635.5000 - size: 1.0000 - loss: 5300.4540 - r_loss: 2856.8228

Both losses go down together. What exactly is the loss in the above snippet?

the `r_loss` displayed in the result is the metric you specified (loss functions are usable as metrics), and the loss actually used by your model is the one specified in the `call` function check [here](https://stackoverflow.com/questions/50063613/what-is-the-purpose-of-the-add-loss-function-in-keras). — Mohamed abdelmagid, Oct 11 '21 at 07:09
@aim97, ok I understand why the `r_loss` is printed (because it's also the metric). However, when I set `r_loss` to return `0.0`, I get `ValueError: Variable ... has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.` which doesn't make sense unless `r_loss` is used for the gradient calculation. — under_the_sea_salad, Oct 11 '21 at 07:18
@coneyhelixlake if u use add loss you should set model.compile(loss=None, ...) — Marco Cerliani, Oct 11 '21 at 07:29
@MarcoCerliani, I want to be able to use both of those losses though. — under_the_sea_salad, Oct 11 '21 at 07:33
@aim97, I just multiplied `r_loss`'s return by a 100 and I am seeing a loss 100x bigger: `351/Unknown - 68s 186ms/step - batch: 175.0000 - size: 1.0000 - loss: 250743.7114 - r_loss: 222452.7656` so I am still not sure how these losses are used for the gradient calculation — under_the_sea_salad, Oct 11 '21 at 07:35
so your question is how to use multiple losses with add_loss? in general only a single loss is minimized which is the simple sum of the losses involved — Marco Cerliani, Oct 11 '21 at 07:36
@MarcoCerliani, setting `loss=None` does not work. I get the same error (`ValueError: Variable ... has 'None' for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.`) — under_the_sea_salad, Oct 11 '21 at 07:37
@MarcoCerliani, close. I want to be able to use both `add_loss` AND the loss specified by `compile`. For whatever reason, after a lot of debugging, this is the way my model needs to be implemented. I could try to see if I could reimplement `compile`'s loss as another `add_loss` and a simple sum of them would suffice. — under_the_sea_salad, Oct 11 '21 at 07:39
you should use add_loss multiple times to add multiple losses — Marco Cerliani, Oct 11 '21 at 07:50

How is add_loss and compile's loss combined for the gradient calculation?

0 Answers0