0

As the title clearly describes the situation I'm experiencing, despite employing Dropout, MaxPooling, EarlyStopping and Regularizers, my CNN model is still overfitting. Also, I've experimented with various learning_rate, dropout_rate, and L1/L2 regularization weight decay. How can I further prevent overfitting?

Here is the model (using Keras on TensorFlow backend):

batch_size = 128
num_epochs = 200
weight_decay = 1e-3
num_filters = 32 * 2
n_kernel_size = 5
num_classes = 3
activation_fn = 'relu'
nb_units = 128
last_dense_units = 128
n_lr = 0.001
n_momentum = 0.99
n_dr = 0.00001
dropout_rate = 0.8

model.add(Embedding(nb_words, EMBEDDING_DIM, input_length=max_seq_len))
model.add(Dropout(dropout_rate))
model.add(Conv1D(num_filters, n_kernel_size, padding='same', activation=activation_fn,
                 kernel_regularizer=regularizers.l2(weight_decay)))
model.add(MaxPooling1D())
model.add(GlobalMaxPooling1D())
model.add(Dense(128, activation=activation_fn, kernel_regularizer=regularizers.l2(weight_decay)))
model.add(Dropout(dropout_rate))
model.add(Dense(num_classes, activation='softmax'))

adam = Adam(lr=n_lr, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=n_dr)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])

early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=3,
    mode='min',
    verbose=1,
    restore_best_weights=True
)

model.fit(...)

Here's the accuracy plots of training and validation: plot

desertnaut
  • 57,590
  • 26
  • 140
  • 166
talha06
  • 6,206
  • 21
  • 92
  • 147
  • This is not overfitting. – desertnaut Jan 27 '21 at 00:20
  • So, what's the definition of this situation? And how can I make the validation accuracy closer to the training accuracy? – talha06 Jan 27 '21 at 00:39
  • This is called "generalization gap" - see (own) answers [here](https://stackoverflow.com/a/61043883/4685471) and [here](https://stackoverflow.com/a/58468274/4685471). As for how we close it, well, this is exactly the billion dollar question...! – desertnaut Jan 27 '21 at 01:22

1 Answers1

0

There are still overfitting methods to try:

Your model does seem to be overfitting by about 10%. But how much overfitting is too much overfitting? I would look to this post and related conversation so you can best evaluate your specific situation.

raceee
  • 477
  • 5
  • 14
  • Yes, exactly, this 10% is critical for my research. #2 was already done as my test set is separated from the training and validation sets. #3 & #4 have not been applied hence the input is text. – talha06 Jan 26 '21 at 20:48
  • If you input is text, data augmentation is still an option see https://arxiv.org/pdf/1901.11196.pdf – raceee Jan 26 '21 at 20:53