I am working on a multioutput image classification problem. My model has 1 input and 3 outputs. I want to train the model using a custom loss function with loss_weight distribution. The custom loss function is a weighted combination of all the class prediction loss and an additional loss based on all the true and prediction values. For example, each output will use a CategoricalCrossentropy and combine the output with other loss functions. In my understanding, I cannot pass true and prediction labels as inputs (ytrue1,ytrue2...,ypred1,ypred2,...)
to an additional loss function with model.compile
method.
I have found a solution mentioned in here and here. In the solutions, it was solved using model.add_loss
method. As my problem is similar to this, I have followed the footsteps and added additional input layers for the true labels and compiled without a loss function.
The code for the model based on the solutions as follows:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import regularizers, optimizers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Activation, Flatten, Dropout, BatchNormalization, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Input
from tensorflow.keras.initializers import he_normal
from tensorflow.keras.callbacks import LearningRateScheduler, TensorBoard
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import get_file
from tensorflow.keras import backend as K
## Number of batch size and training epochs
batch_size = 128
epochs = 10
##### dataset generator (dummy):
## training dataset
train_dataset = tf.data.Dataset.from_tensor_slices(((x_train,y_c_train,y_m_train,y_f_train),
(y_c_train,y_m_train,y_f_train)))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
## training_validation dataset
val_dataset_train = tf.data.Dataset.from_tensor_slices(((x_val,y_c_val,y_m_val,y_f_val),
(y_c_val,y_m_val,y_f_val)))
val_dataset_train = val_dataset_train.batch(batch_size)
# validation dataset
val_dataset = tf.data.Dataset.from_tensor_slices((x_val,(y_c_val,y_m_val,y_f_val)))
val_dataset = val_dataset.batch(batch_size)
##### Loss Weights Modifier:
class LossWeightsModifier(keras.callbacks.Callback):
def __init__(self, LW1, LW2, LW3):
self.LW1 = LW1
self.LW2 = LW2
self.LW3 = LW3
def on_epoch_end(self, epoch, logs={}):
if epoch == 8:
self.LW1.assign(0.1)
self.LW2.assign(0.8)
self.LW3.assign(0.1)
if epoch == 18:
self.LW1.assign(0.1)
self.LW2.assign(0.2)
self.LW3.assign(0.7)
if epoch == 28:
self.LW1.assign(0)
self.LW2.assign(0)
self.LW3.assign(1)
#----------------------- model definition ---------------------------
LW1 = tf.Variable(0.98, dtype="float32", name="LW1") # A1 in paper
LW2 = tf.Variable(0.01, dtype="float32", name="LW2") # A2 in paper
LW3 = tf.Variable(0.01, dtype="float32", name="LW3") # A3 in paper
change_lw = LossWeightsModifier(LW1, LW2, LW3)
##### Learning rate modifier:
def scheduler(epoch):
learning_rate_init = 0.003
if epoch > 2:
learning_rate_init = 0.0005
if epoch > 5:
learning_rate_init = 0.0001
return learning_rate_init
def CustomLoss(y_true_c, y_true_m, y_true_f, y_pred_c, y_pred_m, y_pred_f,LW1,Lw2,LW3):
def cce_loss_fn(y_true, y_pred,name):
cce = tf.keras.losses.CategoricalCrossentropy(name=name)
return cce(y_true, y_pred)
## dummy example
## In actual function it takes (y_true_c, y_true_m, y_true_f, y_pred_c, y_pred_m, y_pred_f) as input and calculates an additional loss based on all the true and prediction
def additional_loss(loss_c, loss_m, loss_f):
## Calculation as required
loss_add = tf.cond(tf.math.less(loss_c, loss_m),
true_fn=lambda: tf.constant((0.0)),
false_fn=lambda: tf.constant((0.1)))
return loss_add
loss_c = cce_loss_fn(y_true_c, y_pred_c,'l1')
loss_m = cce_loss_fn(y_true_m, y_pred_m,'l2')
loss_f = cce_loss_fn(y_true_f, y_pred_f,'l3')
loss_a = additional_loss(loss_c, loss_m, loss_f)
loss_a = 0.1
loss = (1.0-loss_a)*(LW1*loss_c + Lw2*loss_m + LW3*loss_f)-loss_a
return loss
### The demo model is as follows:
img_input = Input(shape=(32, 32, 3), name='input')
#--- block 1 ---
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
x = BatchNormalization()(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
c_1_bch = Flatten(name='c1_flatten')(x)
pred_1 = Dense(2, activation='softmax', name='pred_1')(c_1_bch)
#--- block 3 ---
x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
c_2_bch = Flatten(name='c2_flatten')(x)
pred_2 = Dense(7, activation='softmax', name='pred_2')(c_2_bch)
#--- block 4 ---
x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
c_3_bch = Flatten(name='c3_flatten')(x)
pred_3 = Dense(10, activation='softmax', name='pred_3')(c_3_bch)
y_c = Input(shape=(2), name='input_yc')
y_m = Input(shape=(7), name='input_ym')
y_f = Input(shape=(10), name='input_yf')
model = Model([img_input,y_c,y_m,y_f], [pred_1, pred_2, pred_3], name='M-Output_Model')
model.add_loss(CustomLoss(y_c, y_m, y_f, pred_1, pred_2, pred_3, LW1, LW2, LW3))
##### Compile and Train
opt = optimizers.Adam()
model.compile(optimizer = opt,
loss = None,
metrics={'pred_1' : 'accuracy',
'pred_2' : 'accuracy',
'pred_3' : 'accuracy'}
)
change_lr = LearningRateScheduler(scheduler) # Changing learning rate after epoch_ends
change_lw = LossWeightsModifier(LW1, LW2, LW3) # modifying Loss weights
cbks = [change_lr, change_lw]
history = model.fit(train_dataset,
epochs=epochs,
validation_data=val_dataset_train,
callbacks=cbks,
verbose=1)
In the mentioned solutions, the model was trained using model.fit
method without passing the true labels.
This process works. But does not provide any accuracy or other metrics (As y = None
in mode.fit
). In order to solve this, I have compiled the model by providing accuracy metrics for each output layers and also trained the model using labels (y = [y_c,y_m,y_f]
).
This works fine and outputs the accuracy metrics.
Question-1:
Is this the correct way of implementing this mode? As I am providing true and validation labels, but in the solution, y= None
was used in model.fit
. I hope this does not cause any issue.
Question-2:
I am also using loss weight distribution for calculating the total loss. Is this the correct way of using loss weights in the custom loss function? Note: I am updating the loss weight values using callback.
Is there any other method for training a multioutput model with a customer loss function as I mentioned? Alos, If there is any other simpler way to implement this please advise.
Update:
I have found that it is simpler to use model.add_metric
to update the check the model metrics while training. I have created custom functions for metrics and used it with model.add_metric
.
Code for custom metrics:
### check accuracy
def custom_acc(y_true, y_pred):
return tf.keras.metrics.categorical_accuracy(y_true, y_pred)
### check level loss
def lvl_loss(y_true, y_pred):
return tf.keras.losses.CategoricalCrossentropy()(y_true, y_pred)
model.add_metric(custom_acc(y_c,pred_1), name="lvl_1_acc", aggregation="mean")
model.add_metric(lvl_loss(y_c,pred_1), name="lvl_1_loss", aggregation="mean")
## In this case the data generator can be like this:
## No need to pass y for training in model.fit (y=None)
train_dataset = tf.data.Dataset.from_tensor_slices(((x_train,y_c_train,y_m_train,y_f_train),))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
Question-3:
Although using tf.keras.losses.CategoricalCrossentropy
with model.add_metric
to check level loss does not give any error, but is this the right way?