5

I am using an adapted LeNet model in keras to make a binary classification. I have about 250,000 training samples with ratio 60/40. My model is training very well. The first epoch the accuracy reaches 97 percent with a loss of 0.07. After 10 epochs the accuracy is over 99 percent with a loss of 0.01. I am using a CheckPointer to save my models when they improve.

Around the 11th epoch the accuracy drops to around 55 percent with a loss of around 6. How, could this be possible? Is it because the model cannot be more accurate and it tries to find better weights but completely fails to do so?

My model is an adaptation on the LeNet model:

lenet_model = models.Sequential()
lenet_model.add(Convolution2D(filters=filt_size, kernel_size=(kern_size, kern_size), padding='valid',\
                        input_shape=input_shape))
lenet_model.add(Activation('relu'))
lenet_model.add(BatchNormalization())
lenet_model.add(MaxPooling2D(pool_size=(maxpool_size, maxpool_size)))
lenet_model.add(Convolution2D(filters=64, kernel_size=(kern_size, kern_size), padding='valid'))
lenet_model.add(Activation('relu'))
lenet_model.add(BatchNormalization())
lenet_model.add(MaxPooling2D(pool_size=(maxpool_size, maxpool_size)))
lenet_model.add(Convolution2D(filters=128, kernel_size=(kern_size, kern_size), padding='valid'))
lenet_model.add(Activation('relu'))
lenet_model.add(BatchNormalization())
lenet_model.add(MaxPooling2D(pool_size=(maxpool_size, maxpool_size)))
lenet_model.add(Flatten())
lenet_model.add(Dense(1024, kernel_initializer='uniform'))
lenet_model.add(Activation('relu'))
lenet_model.add(Dense(512, kernel_initializer='uniform'))
lenet_model.add(Activation('relu'))
lenet_model.add(Dropout(0.2))
lenet_model.add(Dense(n_classes, kernel_initializer='uniform'))
lenet_model.add(Activation('softmax'))

lenet_model.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy'])
Wilmar van Ommeren
  • 7,469
  • 6
  • 34
  • 65
  • 1
    Is it the training accuracy or the validation accuracy that drops? It cannot be that it just finds new weights, since gradient descent would not just let the weights change so suddenly. – Michele Tonutti Jun 12 '17 at 07:26
  • The model is trained on all variables, so I am not using validation data in this case. – Wilmar van Ommeren Jun 12 '17 at 07:28
  • 4
    Try changing `loss` to `categorical_crossentropy`. Or change output to have `dim=1` and `activation="softmax"`. – Marcin Możejko Jun 12 '17 at 07:38
  • Never even thought about dim=1 with only a 0 and 1 as possible output. Model is currently training! – Wilmar van Ommeren Jun 12 '17 at 07:59
  • 1
    @MarcinMożejko Looks like this solves the issue. Both with categorical_crossentropy, "softmax" and output dim=2 and binary_crossentropy, "sigmoid", and output dim = 1. Can you explain why the the model trains successfully with the previous settings and fails at a random epoch? – Wilmar van Ommeren Jun 12 '17 at 12:25

1 Answers1

6

The problem lied in applying a binary_crossentropy loss whereas in this case categorical_crossentropy should be applied. Another approach is to leave binary_crossentropy loss but to change output to have dim=1 and activation to sigmoid. The weird behaviour comes from the fact that with binary_crossentropy a multiclass binary classification (with two classes) is actually solved whereas your task is a single class binary classification.

Marcin Możejko
  • 39,542
  • 10
  • 109
  • 120