0

I have the following model:

def make_model(lr_rate=0.001):
    inp = tf.keras.Input(shape=(height,width,depth))
    x = tf.keras.layers.Conv2D(64, (4, 4),strides=(1,1),use_bias=False,padding='same',name='Conv1')(inp)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
    x = tf.keras.layers.Conv2D(32, (3, 3),strides=(1,1),use_bias=False,padding='same',name='Conv2')(x)
    x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
    x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
    x = tf.keras.layers.Flatten()(x)
    x = tf.keras.layers.Dense(512)(x)
    x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
    x = tf.keras.layers.Dense(S1 * S2 * 13,name='scores',activation = 'softmax')(x)
    x = tf.keras.layers.Reshape((S1,S2,13))(x)

    model = tf.keras.Model(inputs=inp,outputs = x)

    optimizer_ = tf.keras.optimizers.Adam(learning_rate=lr_rate)
    model.compile(loss=weighted_categorical_crossentropy(),#tf.keras.losses.CategoricalCrossentropy(axis=3),#yolo_loss_classification,
              optimizer=optimizer_,
              metrics=[
                      keras.metrics.TruePositives(name='tp'),
                      keras.metrics.FalsePositives(name='fp'),
                      keras.metrics.TrueNegatives(name='tn'),
                      keras.metrics.FalseNegatives(name='fn'),
                      keras.metrics.Precision(name='precision'),
                      keras.metrics.Recall(name='recall'),
                      keras.metrics.AUC(name='auc')
                      ])
    print(model.summary())
    return model

Where I am using a custom weighted categorical crossentropy (CCC) loss, because the "normal" CCC does not work with weighting in my case because my output has too many dimensions. The loss itself seems to work fine when I call it on a test sample but for some reason it turns nan sometimes.:

def weighted_categorical_crossentropy(weights = the_weights):

    weights = K.variable(weights)
        
    def loss(y_true, y_pred):
        # scale predictions so that the class probas of each sample sum to 1
        y_pred /= K.sum(y_pred[...,:], axis=-1, keepdims=True)
        # clip to prevent NaN's and Inf's
        y_pred = K.clip(y_pred[...,:], K.epsilon(), 1 - K.epsilon())
        # calc
        loss = y_true[...,:] * K.log(y_pred[...,:]) * weights
        loss = K.sum(-K.sum(loss, 2))
        return loss
    
    return loss

During training the first few epochs work fine, but at some point (usually between epoch 5-10) I suddenly get a nan loss and the training crashes and throws the following exception:

InvalidArgumentError: Graph execution error:
Detected at node 'assert_greater_equal/Assert/AssertGuard/Assert' defined at (most recent call last):
...
...
Node: 'assert_greater_equal/Assert/AssertGuard/Assert'
assertion failed: [predictions must be >= 0] [Condition x >= y did not hold element-wise:] [x (model_10/reshape_10/Reshape:0) = ] [[[[-nan -nan -nan...]]]...] [y (Cast_4/x:0) = ] [0]
     [[{{node assert_greater_equal/Assert/AssertGuard/Assert}}]] [Op:__inference_train_function_70732]

I think it has something to do with my activation functions, that the loss turns nan which is then crashing due to the metrics. So I really need to find out how to stop my loss returning nan. I have this issue both with ReLU and LeakyReLU, but with LeakyReLU my model trains better (or atleast quicker in the first few epochs). I have already checked my data and there is no nan values in the data.

0 Answers0