13

I'm building a custom u-net for a semantic segmentation problem, but i'm seeing a weird behavior in the way that loss and metric are calculated during training, with very significative differences.

Update at the bottom for a minimal reproducible example:

I've read this one (1), and this one (2), another one (3) and yet another one(4), but haven't found a suitable answer.

When training the model, i'm using the same function for loss and for metric, and the results vary wildly.

First example with categorical_cross_entropy (i'm using a very small toy set just to show it):

from tensorflow.python.keras import losses

model.compile(optimizer='adam', loss=losses.categorical_crossentropy,
    metrics=[losses.categorical_crossentropy])

And the output i get is:

 4/4 [===] - 3s 677ms/step - loss: 4.1023 - categorical_crossentropy: 1.0256 
           - val_loss: 1.3864 - val_categorical_crossentropy: 1.3864

As you can see, loss and categorical_crossentropy are about 4x.

If i'm using a custom metric, the difference is orders of magnitude:

from tensorflow.python.keras import backend as K
from tensorflow.python.keras.losses import categorical_crossentropy

def dice_cross_loss(y_true, y_pred, epsilon=1e-6, smooth=1):
    ce_loss = categorical_crossentropy(y_true, y_pred)
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    dice_coef =  (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + epsilon)
    return ce_loss - K.log(dice_coef + epsilon)

model.compile(optimizer='adam', loss=dice_cross_loss,
    metrics=[dice_cross_loss])

When I run it, it's even worse:

4/4 [===] - 3s 682ms/step - loss: 20.9706 - dice_cross_loss: 5.2428 
          - val_loss: 4.3681 - val_dice_cross_loss: 4.3681

When using larger examples, the difference between the loss and the loss as metric can be more than tenfold.

When reading (1), I removed ALL regularization layers that can work differently on evaluation. from the model. No dropout, no batchnorm. There is pooling, but that shouldn't be the cause of it.

The fitiing code is unremarkable:

model.fit(x=data_x, y=data_y, batch_size=batch_size, epochs=epochs,
     verbose=1, validation_split=0.2, shuffle=True, workers=4)

This is the code of the network:

class CustomUnet(object):

    def __init__(self, image_shape=(20, 30, 3), n_class=2, **params):

        # read parameters
        initial_filters = params.get("initial_filters", 64)
        conv_activations = params.get("conv_activations", ReLU())
        final_activation = params.get("final_activation", "softmax")

        self.name = "CustomUnet"
        input_layer = Input(shape=image_shape, name='image_input')

        conv1 = self.conv_block(input_layer, nfilters=initial_filters, activation=conv_activations, name="con1")
        conv1_out = MaxPooling2D(pool_size=(2, 2))(conv1)
        conv2 = self.conv_block(conv1_out, nfilters=initial_filters*2, activation=conv_activations, name="con2")
        conv2_out = MaxPooling2D(pool_size=(2, 2))(conv2)
        conv3 = self.conv_block(conv2_out, nfilters=initial_filters*4, activation=conv_activations, name="con3")
        conv3_out = MaxPooling2D(pool_size=(2, 2))(conv3)
        conv4 = self.conv_block(conv3_out, nfilters=initial_filters*8, activation=conv_activations, name="con4")

        # number jumps from 4 to 7 because it used to have an extra layer and haven't got to refactor properly.
        deconv7 = self.deconv_block(conv4, residual=conv3, nfilters=initial_filters*4, name="decon7",
                                    conv_activations=conv_activations)
        deconv8 = self.deconv_block(deconv7, residual=conv2, nfilters=initial_filters*2, name="decon8",
                                    conv_activations=conv_activations)
        deconv9 = self.deconv_block(deconv8, residual=conv1, nfilters=initial_filters, name="decon9",
                                    conv_activations=conv_activations)

        output_layer = Conv2D(filters=n_class, kernel_size=(1, 1))(deconv9)

        model = Model(inputs=input_layer, outputs=output_layer4, name='Unet')
        self.model = model

    def conv_block(self, input_layer, nfilters, size=3, padding='same', initializer="he_normal", name="none",
                   activation=ReLU()):
        x = Conv2D(filters=nfilters, kernel_size=(size, size), padding=padding, kernel_initializer=initializer)(input_layer)
        x = Activation(activation)(x)
        x = Conv2D(filters=nfilters, kernel_size=(size, size), padding=padding, kernel_initializer=initializer)(x)
        x = Activation(activation)(x)
        return x

    def deconv_block(self, input_layer, residual, nfilters, size=3, padding='same', strides=(2, 2), name="none",
                     conv_activations=ReLU()):
        y = Conv2DTranspose(nfilters, kernel_size=(size, size), strides=strides, padding=padding)(input_layer)
        y = concatenate([y, residual])  #, axis=3)
        y = self.conv_block(y, nfilters, activation=conv_activations)
        return y

Is this an expected behavior? What am I not understanding about the difference on how the loss and the metric are calculated? Have I messed up something in the code?

Thanks!!

Minimal reproducible example:

from tensorflow.python.keras.layers import Input, Conv2D, Activation
from tensorflow.python.keras.models import Model
import numpy as np

input_data = np.random.rand(100, 300, 300, 3)  # 300x300 images
out_data = np.random.randint(0, 2, size=(100, 300, 300, 4)) # 4 classes

def simple_model(image_shape, n_class):
    input_layer = Input(shape=image_shape, name='image_input')
    x = Conv2D(filters=3, kernel_size=(3, 3), padding="same", kernel_initializer="he_normal")(input_layer)
    x = Activation("relu")(x)
    x = Conv2D(filters=3, kernel_size=(3, 3), padding="same", kernel_initializer="he_normal")(x)
    x = Activation("relu")(x)
    x = Conv2D(filters=n_class, kernel_size=(1, 1))(x)
    output_layer = Activation("softmax")(x)
    model = Model(inputs=input_layer, outputs=output_layer, name='Sample')
    return model

sample_model = simple_model(input_data[0].shape, out_data.shape[-1])

sample_model.compile(optimizer='adam', loss="categorical_crossentropy",  metrics=["categorical_crossentropy"])

batch_size = 5
steps = input_data.shape[0] // batch_size

epochs = 20

history = sample_model.fit(x=input_data, y=out_data, batch_size=batch_size, epochs=epochs,  # , callbacks=callbacks,
         verbose=1, validation_split=0.2, workers=1)

And the results I get still have the weirdness:

80/80 [===] - 9s 108ms/step - loss: 14.0259 - categorical_crossentropy: 2.8051 - val_loss: 13.9439 - val_categorical_crossentropy: 2.7885

So loss: 14.0259 - categorical_crossentropy: 2.8051. Now i'm lost...

Juansh L.M.
  • 171
  • 7
  • 1
    Have you noticed that the ratio between your `loss` and your `categorical_crossentropy` is `5.0001426`, pretty close to your batch size? Also, you may want to check https://stackoverflow.com/q/54802328/880783 – bers Feb 21 '19 at 10:01
  • didn't realize that... might be a change in the way keras and TF aggregate those values? Anyway, i'm not mixing modules anymore and the problem is gone now! – Juansh L.M. Feb 21 '19 at 15:07

2 Answers2

4

Got an solution working.

It seems to be an issue with TF imported libraries.

If I do

from tensorflow.python.keras.layers import Input, Conv2D, Activation
from tensorflow.python.keras.models import Model

I get the weird behavior from above

Bue if i replace that for

from keras.layers import Input, Conv2D, Activation
from keras.models import Model

I get much more consistent numers:

 5/80 [>.....] - ETA: 20s - loss: 2.7886 - categorical_crossentropy: 2.7879
10/80 [==>...] - ETA: 12s - loss: 2.7904 - categorical_crossentropy: 2.7899
15/80 [====>.] - ETA: 9s - loss: 2.7900 - categorical_crossentropy: 2.7896 

The are still some differences, but they seem much more reasonable Still, if you know why, please let me know!

Juansh L.M.
  • 171
  • 7
0

Keras does get its source of randomness from the NumPy random number generator, so this must be seeded regardless of whether you are using a Theano or TensorFlow backend.

We have use seed() function at the top of the file before any other imports or other code.

from numpy.random import seed seed(1)

In addition, TensorFlow has its own random number generator that must also be seeded by calling the set_random_seed() function immediately after the NumPy random number generator, as follows:

from tensorflow import set_random_seed set_random_seed(2)

Thanks, Rajeswari Ponnuru.

  • 1
    I tried adding this at the top of the module, even before importing anything else, but i still get the same discrepancy: `4/4 [===] - 7s 2s/step - loss: 8.8822 - categorical_crossentropy: 2.2210 - val_loss: 2.0396 - val_categorical_crossentropy: 2.0396` Just to be clear... the inconsistency is between the `loss` number (which in the `compile` is defined as `losses.categorical_crossentropy` and the `categorical_crossentropy`, which in `compile` is defined as `metrics[losses.categorical_crossentropy]`. `val_loss` and `val_crossentropy` are weirdly in sync... – Juansh L.M. Dec 17 '18 at 15:13
  • Losses and metrics are not supposed to be random at all, so the random seed should not play a role. Remember, the OP calculates losses and metrics on the sama data. – bers Feb 21 '19 at 08:30