Keras multioutput classifier with a custom loss function that includes additional parameters

Question

I am working on a multioutput image classification problem. My model has 1 input and 3 outputs. I want to train the model using a custom loss function with loss_weight distribution. The custom loss function is a weighted combination of all the class prediction loss and an additional loss based on all the true and prediction values. For example, each output will use a CategoricalCrossentropy and combine the output with other loss functions. In my understanding, I cannot pass true and prediction labels as inputs (ytrue1,ytrue2...,ypred1,ypred2,...) to an additional loss function with model.compile method.

I have found a solution mentioned in here and here. In the solutions, it was solved using model.add_loss method. As my problem is similar to this, I have followed the footsteps and added additional input layers for the true labels and compiled without a loss function.

The code for the model based on the solutions as follows:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import regularizers, optimizers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Activation, Flatten, Dropout, BatchNormalization, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Input
from tensorflow.keras.initializers import he_normal
from tensorflow.keras.callbacks import LearningRateScheduler, TensorBoard
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import get_file
from tensorflow.keras import backend as K

## Number of batch size and training epochs

batch_size   = 128
epochs       = 10

##### dataset generator (dummy):
## training dataset
train_dataset = tf.data.Dataset.from_tensor_slices(((x_train,y_c_train,y_m_train,y_f_train),
                                                    (y_c_train,y_m_train,y_f_train)))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
## training_validation dataset
val_dataset_train = tf.data.Dataset.from_tensor_slices(((x_val,y_c_val,y_m_val,y_f_val),
                                                        (y_c_val,y_m_val,y_f_val)))
val_dataset_train = val_dataset_train.batch(batch_size)
# validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_val,(y_c_val,y_m_val,y_f_val)))
val_dataset = val_dataset.batch(batch_size)

##### Loss Weights Modifier:

class LossWeightsModifier(keras.callbacks.Callback):
    def __init__(self, LW1, LW2, LW3):
        self.LW1 = LW1
        self.LW2 = LW2
        self.LW3 = LW3
    def on_epoch_end(self, epoch, logs={}):
        if epoch == 8:
            self.LW1.assign(0.1)
            self.LW2.assign(0.8)
            self.LW3.assign(0.1)
        if epoch == 18:
            self.LW1.assign(0.1)
            self.LW2.assign(0.2)
            self.LW3.assign(0.7)
        if epoch == 28:
            self.LW1.assign(0)
            self.LW2.assign(0)
            self.LW3.assign(1)

#----------------------- model definition ---------------------------
LW1 = tf.Variable(0.98, dtype="float32", name="LW1") # A1 in paper
LW2 = tf.Variable(0.01, dtype="float32", name="LW2") # A2 in paper
LW3 = tf.Variable(0.01, dtype="float32", name="LW3") # A3 in paper

change_lw = LossWeightsModifier(LW1, LW2, LW3)

##### Learning rate modifier:

def scheduler(epoch):
    learning_rate_init = 0.003
    if epoch > 2:
        learning_rate_init = 0.0005
    if epoch > 5:
        learning_rate_init = 0.0001
    return learning_rate_init

def CustomLoss(y_true_c, y_true_m, y_true_f, y_pred_c, y_pred_m, y_pred_f,LW1,Lw2,LW3):
    
    def cce_loss_fn(y_true, y_pred,name):
        cce = tf.keras.losses.CategoricalCrossentropy(name=name)
        return cce(y_true, y_pred)
    
    ## dummy example
    ## In actual function it takes (y_true_c, y_true_m, y_true_f, y_pred_c, y_pred_m, y_pred_f) as input and calculates an additional loss based on all the true and prediction
    def additional_loss(loss_c, loss_m, loss_f):
        ## Calculation as required
        loss_add = tf.cond(tf.math.less(loss_c, loss_m),
                           true_fn=lambda: tf.constant((0.0)),
                           false_fn=lambda: tf.constant((0.1)))
        return loss_add
    
    loss_c = cce_loss_fn(y_true_c, y_pred_c,'l1')
    loss_m = cce_loss_fn(y_true_m, y_pred_m,'l2')
    loss_f = cce_loss_fn(y_true_f, y_pred_f,'l3')
    
    loss_a = additional_loss(loss_c, loss_m, loss_f)
    loss_a = 0.1


    loss = (1.0-loss_a)*(LW1*loss_c + Lw2*loss_m + LW3*loss_f)-loss_a
    return loss


### The demo model is as follows:
img_input = Input(shape=(32, 32, 3), name='input')

#--- block 1 ---
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
x = BatchNormalization()(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
c_1_bch = Flatten(name='c1_flatten')(x)
pred_1 = Dense(2, activation='softmax', name='pred_1')(c_1_bch)

#--- block 3 ---
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
c_2_bch = Flatten(name='c2_flatten')(x)
pred_2 = Dense(7, activation='softmax', name='pred_2')(c_2_bch)
#--- block 4 ---
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
c_3_bch = Flatten(name='c3_flatten')(x)
pred_3 = Dense(10, activation='softmax', name='pred_3')(c_3_bch)

y_c = Input(shape=(2), name='input_yc')
y_m = Input(shape=(7), name='input_ym')
y_f = Input(shape=(10), name='input_yf')

model = Model([img_input,y_c,y_m,y_f], [pred_1, pred_2, pred_3], name='M-Output_Model')

model.add_loss(CustomLoss(y_c, y_m, y_f, pred_1, pred_2, pred_3, LW1, LW2, LW3))

##### Compile and Train
opt = optimizers.Adam()

model.compile(optimizer = opt, 
              loss = None,
              metrics={'pred_1' : 'accuracy',
                      'pred_2' : 'accuracy',
                      'pred_3' : 'accuracy'}
             )

change_lr = LearningRateScheduler(scheduler) # Changing learning rate after epoch_ends
change_lw = LossWeightsModifier(LW1, LW2, LW3) # modifying Loss weights
cbks = [change_lr, change_lw]

history = model.fit(train_dataset,
                    epochs=epochs,
                    validation_data=val_dataset_train,
                    callbacks=cbks,
                    verbose=1)

In the mentioned solutions, the model was trained using model.fit method without passing the true labels. This process works. But does not provide any accuracy or other metrics (As y = None in mode.fit). In order to solve this, I have compiled the model by providing accuracy metrics for each output layers and also trained the model using labels (y = [y_c,y_m,y_f]).
This works fine and outputs the accuracy metrics.

Question-1:

Is this the correct way of implementing this mode? As I am providing true and validation labels, but in the solution, y= None was used in model.fit. I hope this does not cause any issue.

Question-2:

I am also using loss weight distribution for calculating the total loss. Is this the correct way of using loss weights in the custom loss function? Note: I am updating the loss weight values using callback.

Is there any other method for training a multioutput model with a customer loss function as I mentioned? Alos, If there is any other simpler way to implement this please advise.

Update:

I have found that it is simpler to use model.add_metric to update the check the model metrics while training. I have created custom functions for metrics and used it with model.add_metric.

Code for custom metrics:

### check accuracy
def custom_acc(y_true, y_pred):
    return tf.keras.metrics.categorical_accuracy(y_true, y_pred)

### check level loss
def lvl_loss(y_true, y_pred):
    return tf.keras.losses.CategoricalCrossentropy()(y_true, y_pred)

model.add_metric(custom_acc(y_c,pred_1), name="lvl_1_acc", aggregation="mean")

model.add_metric(lvl_loss(y_c,pred_1), name="lvl_1_loss", aggregation="mean")

## In this case the data generator can be like this:
## No need to pass y for training in model.fit (y=None)
train_dataset = tf.data.Dataset.from_tensor_slices(((x_train,y_c_train,y_m_train,y_f_train),))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)

Question-3:

Although using tf.keras.losses.CategoricalCrossentropy with model.add_metric to check level loss does not give any error, but is this the right way?

You may need to do custom training loop for your cases. [Doc.](https://keras.io/guides/writing_a_training_loop_from_scratch/). — Innat, Nov 19 '22 at 15:40
Check the above details. Let me know if your issues are solved or you're still open for suggestion. There's another way to achieve this. — Innat, Nov 24 '22 at 07:54