Deep Learning: small dataset with keras : local minima

Question

For my thesis, I'm running a 4 layered deep network for sequence to sequence translation use-case 150 x Conv(64,5) x GRU (100) x softmax activation on last stage with loss='categorical_crossentropy'.

Training loss and accuracy converge optimally pretty quickly where as validation loss and accuracy seem to be stuck in val_acc 97 to 98.2 range, unable to go past beyond that.

Is my model overfitting?

Have tried dropout of 0.2 between layers.

Output after drop-out
    Epoch 85/250
    [==============================] - 3s - loss: 0.0057 - acc: 0.9996 - val_loss: 0.2249 - val_acc: 0.9774
    Epoch 86/250
    [==============================] - 3s - loss: 0.0043 - acc: 0.9987 - val_loss: 0.2063 - val_acc: 0.9774
    Epoch 87/250
    [==============================] - 3s - loss: 0.0039 - acc: 0.9987 - val_loss: 0.2180 - val_acc: 0.9809
    Epoch 88/250
    [==============================] - 3s - loss: 0.0075 - acc: 0.9978 - val_loss: 0.2272 - val_acc: 0.9774
    Epoch 89/250
    [==============================] - 3s - loss: 0.0078 - acc: 0.9974 - val_loss: 0.2265 - val_acc: 0.9774
    Epoch 90/250
    [==============================] - 3s - loss: 0.0027 - acc: 0.9996 - val_loss: 0.2212 - val_acc: 0.9809
    Epoch 91/250
    [==============================] - 3s - loss: 3.2185e-04 - acc: 1.0000 - val_loss: 0.2190 - val_acc: 0.9809
    Epoch 92/250
    [==============================] - 3s - loss: 0.0020 - acc: 0.9991 - val_loss: 0.2239 - val_acc: 0.9792
    Epoch 93/250
    [==============================] - 3s - loss: 0.0047 - acc: 0.9987 - val_loss: 0.2163 - val_acc: 0.9809
    Epoch 94/250
    [==============================] - 3s - loss: 2.1863e-04 - acc: 1.0000 - val_loss: 0.2190 - val_acc: 0.9809
    Epoch 95/250
    [==============================] - 3s - loss: 0.0011 - acc: 0.9996 - val_loss: 0.2190 - val_acc: 0.9809
    Epoch 96/250
    [==============================] - 3s - loss: 0.0040 - acc: 0.9987 - val_loss: 0.2289 - val_acc: 0.9792
    Epoch 97/250
    [==============================] - 3s - loss: 2.9621e-04 - acc: 1.0000 - val_loss: 0.2360 - val_acc: 0.9792
    Epoch 98/250
    [==============================] - 3s - loss: 4.3776e-04 - acc: 1.0000 - val_loss: 0.2437 - val_acc: 0.9774

score 3 · Accepted Answer · answered Jul 19 '17 at 11:52

The case you presented is a really complexed one. In order to answer your question if overfitting is actually happening in your case you need to answer two questions:

Are results obtained on validation set satisfying?- the main purpose of a validation set is to provide you with insights what will happen when new data arrives. If you are satisfied with an accuracy on a validation set then you should think about your model as not overfitting too much.
Should I worry on extremely high accuracy of your model on a training set?- you may easily notice that your model is almost perfect on a training set. This could mean that it learned some patterns by heart. Usually - there is always some noise in your data - and the property of your model to be perfect on a data - means that it probably uses some part of its capacity to learn bias. To test that I usually prefer to test positive examples with a lowest score or negative samples with a highest score - as outliers are usually in these two groups (model is struggling to push them above / below 0.5 treshold).

So - after checking these two concerns you may get an answer if your model overfit. The behaviour you presented is really nice - and what could be the actual reason behind is that there are few patterns in a validation set which are not properly covered in a training set. But this is something you should always take into account when you are designing a Machine Learning solution.

took time to check different permutations on the model. Your observation that network is using part of the capacity to learn bais is correct. tried various model capacity reduction percentages to verify. at lower capacity both training acc and validation accuracy move in tandem. your 2nd observation of validation set having unique patterns is also true, took time to manually verify both data-sets — Ajay, Jul 20 '17 at 15:13

score 1 · Answer 2 · answered Jul 19 '17 at 11:38

1

No, this is not overfitting. Overfitting only happens when the training loss is low, and the validation loss is high. This can also be seen as a high difference between training and validation accuracy (in case of classification).

answered Jul 19 '17 at 11:38

Dr. Snoopy

55,122
7
121
140

what parameters can we use to check overfitting in sequence-to-sequence use-case? – Ajay Jul 20 '17 at 15:20

Deep Learning: small dataset with keras : local minima

2 Answers2