How to access all outputs from a single custom loss function in keras

Question

I'm trying to reproduce the architecture of the network proposed in this publication in tensorFlow. Being a total beginner to this, I've been using this tutorial as a base to work on, using tensorflow==2.3.2.

To train this network, they use a loss which implies outputs from two branches of the network at the same time, which made me look towards custom losses function in keras. I've got that you can define your own, as long as the definition of the function looks like the following:

def custom_loss(y_true, y_pred):

I also understood that you could give other arguments like so:

def loss_function(margin=0.3):
    def custom_loss(y_true, y_pred):
        # And now you can use margin

You then just have to call these while compiling your model. When it comes to using multiple outputs, the most common approach seem to be the one proposed here, where you would give several losses functions, one being called for each of your output. However, I could not find a solution to give several outputs to a loss function, which is what I need here.

To further explain it, here is a minimal working example showing what I've tried, which you can try for yourself in this collab.

import os
import tensorflow as tf
import keras.backend as K
from tensorflow.keras import datasets, layers, models, applications, losses
from tensorflow.keras.preprocessing import image_dataset_from_directory

_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')

BATCH_SIZE = 32
IMG_SIZE = (160, 160)
IMG_SHAPE = IMG_SIZE + (3,)

train_dataset = image_dataset_from_directory(train_dir,
                                             shuffle=True,
                                             batch_size=BATCH_SIZE,
                                             image_size=IMG_SIZE)

validation_dataset = image_dataset_from_directory(validation_dir,
                                                  shuffle=True,
                                                  batch_size=BATCH_SIZE,
                                                  image_size=IMG_SIZE)

data_augmentation = tf.keras.Sequential([
  layers.experimental.preprocessing.RandomFlip('horizontal'),
  layers.experimental.preprocessing.RandomRotation(0.2),
])
preprocess_input = applications.resnet50.preprocess_input
base_model = applications.ResNet50(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')
base_model.trainable = True
conv = layers.Conv2D(filters=128, kernel_size=(1,1))
global_pooling = layers.GlobalAveragePooling2D()
horizontal_pooling = layers.AveragePooling2D(pool_size=(1, 5))
reshape = layers.Reshape((-1, 128))

def custom_loss(y_true, y_pred):
    print(y_pred.shape)
    # Do some stuffs involving both outputs
    # Returning something trivial here for correct behavior
    return K.mean(y_pred)

inputs = tf.keras.Input(shape=IMG_SHAPE)
x = data_augmentation(inputs)
x = preprocess_input(x)
x = base_model(x, training=True)

first_branch = global_pooling(x)

second_branch = conv(x)
second_branch = horizontal_pooling(second_branch)
second_branch = reshape(second_branch)

model = tf.keras.Model(inputs, [first_branch, second_branch])
base_learning_rate = 0.0001
model.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
              loss=custom_loss,
              metrics=['accuracy'])
model.summary()

initial_epochs = 10
history = model.fit(train_dataset,
                    epochs=initial_epochs,
                    validation_data=validation_dataset)

while doing so, I thought that the y_pred given to loss function would be a list, containing both outputs. However, while running it, what I've got in stdout was this:

Epoch 1/10
(None, 2048)
(None, 5, 128)

What I understand from this is that the loss function is called with every output, one by one, instead of being called once with all the outputs, which means I can't define a loss that would use both the outputs at the same time. Is there any way to achieve this?

Please let me know if I'm unclear, or if you need further details.

For me, it's unclear what you really need. Can you explain in a simple way? Are you wondering how can you use the loss function for multi output? Each of them should use a different loss function? — Innat, Mar 05 '21 at 19:20
Thanks for taking the time to answer this! No, I would need one loss function in which I get all outputs. — Miorii, Mar 05 '21 at 19:23

score 1 · Answer 1 · answered Mar 05 '21 at 20:09

Ok, here is an easy way to achieve this. We can achieve this by using the loss_weights parameter. We can weigh multiple outputs exactly the same so that we can get the combined loss results. So, for two output we can do

loss_weights = 1*output1 + 1*output2

In your case, your network has two outputs, by the name they are reshape, and global_average_pooling2d. You can do now as follows

# calculation of loss for one output, i.e. reshape
def reshape_loss(y_true, y_pred):
    # do some math with these two 
    return K.mean(y_pred)

# calculation of loss for another output, i.e. global_average_pooling2d
def gap_loss(y_true, y_pred):
    # do some math with these two 
    return K.mean(y_pred)

And while compiling now you need to do as this

model.compile(
    optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate), 
    loss = {
         'reshape':reshape_loss, 
         'global_average_pooling2d':gap_loss
      },
    loss_weights = {
        'reshape':1., 
        'global_average_pooling2d':1.
     }
    )

Now, the loss is the result of 1.*reshape + 1.*global_average_pooling2d.

Thank you for taking the time to detail this answer, but this isn't exactly what I need, I'll try to explain it further. In my first loss, namely reshape_loss here, I compute one loss, but also some additional informations, that are required to computing the other loss, namely gap_loss. I thought the easiest way to do so was to define a single loss function, such as total_loss(y_true, y_pred1, y_pred2). In the model you suggest, I can't pass any information between the two losses. Is there any way to do so? — Miorii, Mar 06 '21 at 09:33
Hello Miorii, I understand precisely what you are looking for as I am also looking for a way to implement something similar. Have you found a way to implement this? @M.Innat, thank you for the detailed example! From your example it is clear that the total loss function will be as follows: `loss = 1*reshape_loss + 1* gap_loss`. Now, If I want to add another function based to this output. So the function will be `loss = 1*reshape_loss + 1* gap_loss + 1 * additional_loss`, where the `additional_loss` is function of `y_ture1, y_ture2, y_pred1, y_pred2`. Is there any way to implement this? — tasrif, Nov 08 '22 at 11:43

score 1 · Accepted Answer · answered Aug 13 '21 at 03:08

I had the same problem trying to implement Triplet_Loss function.

I refered to Keras's implementation for Siamese Network with Triplet Loss Function but something didnt work out and I had to implement the network by myself.

def get_siamese_model(input_shape, conv2d_filters):
    # Define the tensors for the input images
    anchor_input = Input(input_shape, name="Anchor_Input")
    positive_input = Input(input_shape, name="Positive_Input")
    negative_input = Input(input_shape, name="Negative_Input")

    body = build_body(input_shape, conv2d_filters)
    # Generate the feature vectors for the images
    encoded_a = body(anchor_input)
    encoded_p = body(positive_input)
    encoded_n = body(negative_input)

    distance = DistanceLayer()(encoded_a, encoded_p, encoded_n)
    # Connect the inputs with the outputs
    siamese_net = Model(inputs=[anchor_input, positive_input, negative_input],
                        outputs=distance)
    return siamese_net

and the "bug" was in DistanceLayer Implementation Keras posted (also in the same link above).

class DistanceLayer(tf.keras.layers.Layer):
    """
    This layer is responsible for computing the distance between the anchor
    embedding and the positive embedding, and the anchor embedding and the
    negative embedding.
    """

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def call(self, anchor, positive, negative):
        ap_distance = tf.math.reduce_sum(tf.math.square(anchor - positive), axis=1, keepdims=True, name='ap_distance')
        an_distance = tf.math.reduce_sum(tf.math.square(anchor - negative), axis=1, keepdims=True, name='an_distance')
        return (ap_distance, an_distance)

When I was training the model, the loss function took only one of the vectors ap_distance or an_distance.

FINALLY, THE FIX WAS to concatenate the vectors together (along axis=1 this case) and on the loss function, take them apart:

    def call(self, anchor, positive, negative):
        ap_distance = tf.math.reduce_sum(tf.math.square(anchor - positive), axis=1, keepdims=True, name='ap_distance')
        an_distance = tf.math.reduce_sum(tf.math.square(anchor - negative), axis=1, keepdims=True, name='an_distance')
        return tf.concat([ap_distance, an_distance], axis=1)

on my custom loss:

def get_loss(margin=1.0):
    def triplet_loss(y_true, y_pred):
        # The output of the network is NOT A tuple, but a matrix shape (batch_size, 2),
        # containing the distances between the anchor and the positive example,
        # and the anchor and the negative example.
        ap_distance = y_pred[:, 0]
        an_distance = y_pred[:, 1]

        # Computing the Triplet Loss by subtracting both distances and
        # making sure we don't get a negative value.
        loss = tf.math.maximum(ap_distance - an_distance + margin, 0.0)
        # tf.print("\n", ap_distance, an_distance)
        # tf.print(f"\n{loss}\n")
        return loss

    return triplet_loss

How to access all outputs from a single custom loss function in keras

2 Answers2

Linked