CNN train accuracy gets better during training, but test accuracy stays around 40%

Question

So in the past few months I've been learning a lot about neural networks with Tensorflow and Keras, so I wanted to try to make a model for the CIFAR10 dataset (code below).

However, during the training process, the accuracy gets better (from about 35% after 1 epoch to about 60-65% after 5 epochs), but the val_acc stays the same or increases only a little. Here are the printed results:

Epoch 1/5
50000/50000 [==============================] - 454s 9ms/step - loss: 1.7761 - acc: 0.3584 - val_loss: 8.6776 - val_acc: 0.4489
Epoch 2/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.3670 - acc: 0.5131 - val_loss: 8.9749 - val_acc: 0.4365
Epoch 3/5
50000/50000 [==============================] - 451s 9ms/step - loss: 1.2089 - acc: 0.5721 - val_loss: 7.7254 - val_acc: 0.5118
Epoch 4/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.1140 - acc: 0.6080 - val_loss: 7.9587 - val_acc: 0.4997
Epoch 5/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.0306 - acc: 0.6385 - val_loss: 7.4351 - val_acc: 0.5321
10000/10000 [==============================] - 27s 3ms/step
loss:  7.435152648162842 
accuracy:  0.5321

I've looked around on the internet and my best guess is that my model is overfitted, so I've tried removing some layers, adding more dropout layers and reducing the amount of filters, but none showed any enhancement.

The weirdest thing is that a while ago I made a very similar model, based on some tutorials, which had a final accuracy of 80% after 8 epochs. (I lost that file though)

Here is the code of my model:

model = Sequential()
model.add(Conv2D(filters=256,
                 kernel_size=(3, 3),
                 activation='relu',
                 data_format='channels_last',
                 input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=128,
                 kernel_size=(2, 2),
                 activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))


model.compile(optimizer=adam(),
              loss=categorical_crossentropy,
              metrics=['accuracy'])

model.fit(train_images, train_labels,
          batch_size=1000,
          epochs=5,
          verbose=1,
          validation_data=(test_images, test_labels))

loss, accuracy = model.evaluate(test_images, test_labels)
print('loss: ', loss, '\naccuracy: ', accuracy)

train_images and test_images are numpy arrays of size (50000,32,32,3) and (10000,32,32,3) and train_labels and test_labels are numpy arrays of size (50000,10) and (10000,10).

My question: what causes this and what can I do about it?

Edit after Maxim's answer:

I changed the model to this:

model = Sequential()
model.add(Conv2D(filters=64,
                 kernel_size=(3, 3),
                 activation='relu',
                 kernel_initializer='he_normal',    # better for relu based networks
                 input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=256,
                 kernel_size=(3, 3),
                 activation='relu',
                 kernel_initializer='he_normal'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(10, activation='softmax'))

and the output is now this:

Epoch 1/10
50000/50000 [==============================] - 326s 7ms/step - loss: 1.4916 - acc: 0.4809 - val_loss: 7.7175 - val_acc: 0.5134
Epoch 2/10
50000/50000 [==============================] - 338s 7ms/step - loss: 1.0622 - acc: 0.6265 - val_loss: 6.9945 - val_acc: 0.5588
Epoch 3/10
50000/50000 [==============================] - 326s 7ms/step - loss: 0.8957 - acc: 0.6892 - val_loss: 6.6270 - val_acc: 0.5833
Epoch 4/10
50000/50000 [==============================] - 324s 6ms/step - loss: 0.7813 - acc: 0.7271 - val_loss: 5.5790 - val_acc: 0.6474
Epoch 5/10
50000/50000 [==============================] - 327s 7ms/step - loss: 0.6690 - acc: 0.7668 - val_loss: 5.7479 - val_acc: 0.6358
Epoch 6/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.5671 - acc: 0.8031 - val_loss: 5.8720 - val_acc: 0.6302
Epoch 7/10
50000/50000 [==============================] - 328s 7ms/step - loss: 0.4865 - acc: 0.8319 - val_loss: 5.6320 - val_acc: 0.6451
Epoch 8/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.3995 - acc: 0.8611 - val_loss: 5.3879 - val_acc: 0.6615
Epoch 9/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.3337 - acc: 0.8837 - val_loss: 5.6874 - val_acc: 0.6432
Epoch 10/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.2806 - acc: 0.9033 - val_loss: 5.7424 - val_acc: 0.6399
10000/10000 [==============================] - 19s 2ms/step
loss:  5.74234927444458 
accuracy:  0.6399

It seems that I'm overfitting again, even though I changed the model with the help I've gotten so far... Any explanations or tips?

The input images are (32,32,3) numpy arrays normalized to (0,1)

reduce batchsize to be about 128 or 64. Decrease the size of dense layer may be 512 or 384 — Umang Gupta, Feb 03 '18 at 09:06
Thanks for the answer, I thought batch size only had an impact on the memory, but it seems to have a bigger impact on the training than I thought. I'm retraining the model now with a smaller batch size, but it might take a really long time.. I'll keep you updated! — wohe1, Feb 03 '18 at 09:19

Maxim · Accepted Answer · 2018-02-03T09:43:13.097

You haven't included how you prepare the data, here's one addition that made this network learn much better:

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

If you do data normalization like that, then your network is fine: it hits ~65-70% test accuracy after 5 epochs, which is a good result. Note that 5 epochs is just a start, it would need around 30-50 epochs to really learn the data well and show a result close to state of the art.

Below are some minor improvements that I noticed and can get you extra performance points:

Since you're using ReLu based network, he_normal initializer is better than glorot_uniform (which is a default in Conv2D).
It is strange to decrease the number of filters as you go deeper in the network. You should do right the opposite. I changed 256 -> 64 and 128 -> 256 and the accuracy improved.
I decreased the dropout slightly 0.5 -> 0.4.
Kernel size 3x3 is more common than 2x2. I think you should try it for the second conv layer as well. In fact, you can play with all hyper-parameters to find the best combination.

Here's the final code:

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

model = Sequential()
model.add(Conv2D(filters=64,
                 kernel_size=(3, 3),
                 activation='relu',
                 kernel_initializer='he_normal',
                 input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=256,
                 kernel_size=(2, 2),
                 kernel_initializer='he_normal',
                 activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(10, activation='softmax'))

model.compile(optimizer=adam(),
              loss=categorical_crossentropy,
              metrics=['accuracy'])

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

model.fit(x_train, y_train,
          batch_size=500,
          epochs=5,
          verbose=1,
          validation_data=(x_test, y_test))

loss, accuracy = model.evaluate(x_test, y_test)
print('loss: ', loss, '\naccuracy: ', accuracy)

The result after 5 epochs:

loss:  0.822134458447 
accuracy:  0.7126

By the way, you might be interested to compare your approach with keras example CIFAR-10 conv net.

Thanks for the answer, my data is already normalized, but I'll try your other remarks as soon as my current training process is done — wohe1, Feb 03 '18 at 09:43
I've edited my question with my new model and output, could you please look at it? — wohe1, Feb 03 '18 at 22:32
@wohe1 did you try my code exactly? Because my result was different. I wonder if it's the code you didn't include — Maxim, Feb 03 '18 at 23:02
I have put all my code here: https://github.com/wohe157/cifar10 (also the preprocessing) — wohe1, Feb 03 '18 at 23:40

CNN train accuracy gets better during training, but test accuracy stays around 40%

Edit after Maxim's answer:

1 Answers1

Linked