So in the past few months I've been learning a lot about neural networks with Tensorflow and Keras, so I wanted to try to make a model for the CIFAR10 dataset (code below).
However, during the training process, the accuracy gets better (from about 35% after 1 epoch to about 60-65% after 5 epochs), but the val_acc stays the same or increases only a little. Here are the printed results:
Epoch 1/5
50000/50000 [==============================] - 454s 9ms/step - loss: 1.7761 - acc: 0.3584 - val_loss: 8.6776 - val_acc: 0.4489
Epoch 2/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.3670 - acc: 0.5131 - val_loss: 8.9749 - val_acc: 0.4365
Epoch 3/5
50000/50000 [==============================] - 451s 9ms/step - loss: 1.2089 - acc: 0.5721 - val_loss: 7.7254 - val_acc: 0.5118
Epoch 4/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.1140 - acc: 0.6080 - val_loss: 7.9587 - val_acc: 0.4997
Epoch 5/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.0306 - acc: 0.6385 - val_loss: 7.4351 - val_acc: 0.5321
10000/10000 [==============================] - 27s 3ms/step
loss: 7.435152648162842
accuracy: 0.5321
I've looked around on the internet and my best guess is that my model is overfitted, so I've tried removing some layers, adding more dropout layers and reducing the amount of filters, but none showed any enhancement.
The weirdest thing is that a while ago I made a very similar model, based on some tutorials, which had a final accuracy of 80% after 8 epochs. (I lost that file though)
Here is the code of my model:
model = Sequential()
model.add(Conv2D(filters=256,
kernel_size=(3, 3),
activation='relu',
data_format='channels_last',
input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=128,
kernel_size=(2, 2),
activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=adam(),
loss=categorical_crossentropy,
metrics=['accuracy'])
model.fit(train_images, train_labels,
batch_size=1000,
epochs=5,
verbose=1,
validation_data=(test_images, test_labels))
loss, accuracy = model.evaluate(test_images, test_labels)
print('loss: ', loss, '\naccuracy: ', accuracy)
train_images
and test_images
are numpy arrays
of size (50000,32,32,3)
and (10000,32,32,3)
and train_labels
and test_labels
are numpy arrays
of size (50000,10)
and (10000,10)
.
My question: what causes this and what can I do about it?
Edit after Maxim's answer:
I changed the model to this:
model = Sequential()
model.add(Conv2D(filters=64,
kernel_size=(3, 3),
activation='relu',
kernel_initializer='he_normal', # better for relu based networks
input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=256,
kernel_size=(3, 3),
activation='relu',
kernel_initializer='he_normal'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(10, activation='softmax'))
and the output is now this:
Epoch 1/10
50000/50000 [==============================] - 326s 7ms/step - loss: 1.4916 - acc: 0.4809 - val_loss: 7.7175 - val_acc: 0.5134
Epoch 2/10
50000/50000 [==============================] - 338s 7ms/step - loss: 1.0622 - acc: 0.6265 - val_loss: 6.9945 - val_acc: 0.5588
Epoch 3/10
50000/50000 [==============================] - 326s 7ms/step - loss: 0.8957 - acc: 0.6892 - val_loss: 6.6270 - val_acc: 0.5833
Epoch 4/10
50000/50000 [==============================] - 324s 6ms/step - loss: 0.7813 - acc: 0.7271 - val_loss: 5.5790 - val_acc: 0.6474
Epoch 5/10
50000/50000 [==============================] - 327s 7ms/step - loss: 0.6690 - acc: 0.7668 - val_loss: 5.7479 - val_acc: 0.6358
Epoch 6/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.5671 - acc: 0.8031 - val_loss: 5.8720 - val_acc: 0.6302
Epoch 7/10
50000/50000 [==============================] - 328s 7ms/step - loss: 0.4865 - acc: 0.8319 - val_loss: 5.6320 - val_acc: 0.6451
Epoch 8/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.3995 - acc: 0.8611 - val_loss: 5.3879 - val_acc: 0.6615
Epoch 9/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.3337 - acc: 0.8837 - val_loss: 5.6874 - val_acc: 0.6432
Epoch 10/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.2806 - acc: 0.9033 - val_loss: 5.7424 - val_acc: 0.6399
10000/10000 [==============================] - 19s 2ms/step
loss: 5.74234927444458
accuracy: 0.6399
It seems that I'm overfitting again, even though I changed the model with the help I've gotten so far... Any explanations or tips?
The input images are (32,32,3)
numpy arrays normalized to (0,1)