val_loss is halved but val_acc stays constant

Question

I am training a neural network and get the following output. The loss and val_loss are both decreasing, which makes me happy. However, the val_acc keeps constant. What reasons can that have? My data is quite imbalanced, but I am weighing it via the sklearn compute_class_weight function.

Train on 109056 samples, validate on 27136 samples
Epoch 1/200
- 1174s - loss: 1.0353 - acc: 0.5843 - val_loss: 1.0749 - val_acc: 0.7871

Epoch 00001: val_acc improved from -inf to 0.78711, saving model to 
nn_best_weights.h5
Epoch 2/200
- 1174s - loss: 1.0122 - acc: 0.6001 - val_loss: 1.0642 - val_acc: 0.9084

Epoch 00002: val_acc improved from 0.78711 to 0.90842, saving model to 
nn_best_weights.h5
Epoch 3/200
- 1176s - loss: 0.9974 - acc: 0.5885 - val_loss: 1.0445 - val_acc: 0.9257

Epoch 00003: val_acc improved from 0.90842 to 0.92571, saving model to 
nn_best_weights.h5
Epoch 4/200
- 1177s - loss: 0.9834 - acc: 0.5760 - val_loss: 1.0071 - val_acc: 0.9260

Epoch 00004: val_acc improved from 0.92571 to 0.92597, saving model to 
nn_best_weights.h5
Epoch 5/200
- 1182s - loss: 0.9688 - acc: 0.5639 - val_loss: 1.0175 - val_acc: 0.9260

Epoch 00005: val_acc did not improve from 0.92597
Epoch 6/200
- 1177s - loss: 0.9449 - acc: 0.5602 - val_loss: 0.9976 - val_acc: 0.9246

Epoch 00006: val_acc did not improve from 0.92597
Epoch 7/200
- 1186s - loss: 0.9070 - acc: 0.5598 - val_loss: 0.9667 - val_acc: 0.9258

Epoch 00007: val_acc did not improve from 0.92597
Epoch 8/200
- 1178s - loss: 0.8541 - acc: 0.5663 - val_loss: 0.9254 - val_acc: 0.9221

Epoch 00008: val_acc did not improve from 0.92597
Epoch 9/200
- 1171s - loss: 0.7859 - acc: 0.5853 - val_loss: 0.8686 - val_acc: 0.9237

Epoch 00009: val_acc did not improve from 0.92597
Epoch 10/200
- 1172s - loss: 0.7161 - acc: 0.6139 - val_loss: 0.8119 - val_acc: 0.9260

Epoch 00010: val_acc did not improve from 0.92597
Epoch 11/200
- 1168s - loss: 0.6500 - acc: 0.6416 - val_loss: 0.7531 - val_acc: 0.9259

Epoch 00011: val_acc did not improve from 0.92597
Epoch 12/200
- 1164s - loss: 0.5967 - acc: 0.6676 - val_loss: 0.7904 - val_acc: 0.9260

Epoch 00012: val_acc did not improve from 0.92597
Epoch 13/200
- 1175s - loss: 0.5608 - acc: 0.6848 - val_loss: 0.7589 - val_acc: 0.9259

Epoch 00013: val_acc did not improve from 0.92597
Epoch 14/200
- 1221s - loss: 0.5377 - acc: 0.6980 - val_loss: 0.7811 - val_acc: 0.9260

Epoch 00014: val_acc did not improve from 0.92597

My model is the following. I know the kernel size is quite large, but that is on purpose because the data is structured in a certain way.

    cnn = Sequential()
    cnn.add(Conv2D(16, kernel_size=(2, 100), padding='same', data_format="channels_first", input_shape=(1,10, 100)))
    cnn.add(LeakyReLU(alpha=0.01))
    cnn.add(BatchNormalization())
    cnn.add(Conv2D(16, (1, 1)))
    cnn.add(LeakyReLU(alpha=0.01))
    cnn.add(Conv2D(16, (1, 8)))
    cnn.add(Flatten()) 
    rnn = Sequential()
    rnn = LSTM(100, return_sequences=False, dropout=0.2)
    dense = Sequential()
    dense.add(Dense(3, activation='softmax'))
    main_input = Input(batch_shape=(512, 1, 1, 10, 100)) 
    model = TimeDistributed(cnn)(main_input) 
    model = rnn(model) 
    model = dense(model) 
    replica = Model(inputs=main_input, outputs=model)
    replica.compile(loss='categorical_crossentropy', optimizer='adam',  metrics=['accuracy'])

At least in principle, there is nothing *necessarily* strange with decreasing loss and non-increasing accuracy, especially if the accuracy is already relatively high ( > 0.90); the answer in [Loss & accuracy - Are these reasonable learning curves?](https://stackoverflow.com/questions/47817424/loss-accuracy-are-these-reasonable-learning-curves/47819022#47819022) may be helpful. — desertnaut, Mar 08 '19 at 18:31

score 1 · Accepted Answer · edited Mar 08 '19 at 18:27

1

It is hard to answer you question not knowing your model.

The possible answers are:

There is nothing wrong with your model. This may be the highest accuracy you can get.
Your data can be imbalanced or not shuffled. Higher val_acc then acc indicates that the can be something wrong with train, evaluation and test split. Train accuracy tends to be higher then val_acc on the begining. Then val_acc catches up, or not ;) I can also indicate that there is not much of variance in your dataset, then you could have this kind of behaviour.
Your learn rate may be to big. Try to decrease it.

And I guess the actual metric for the model to minimize is the loss, so during the optimization process you should follow the loss and monitor its improvements.

Check this link for more information how to check your model.

edited Mar 08 '19 at 18:27

desertnaut

57,590
26
140
166

answered Mar 08 '19 at 16:52

RKO

120
9

1

Thanks for the input. I posted the model. Indeed, my data is not shuffled since I am using time-series data and a LSTM. Would you still advise shuffling? – freddy888 Mar 08 '19 at 17:02
I do not know how your dataset is constructed. If shuffling your dataset won't break your sequence (it won't if you classify sentences and there is one sentence per line) then you should shuffle your data to be sure that there is variance in your data. Now looking at the results there seems to be some problem with it. – RKO Mar 08 '19 at 17:15
I added link to the article that may help you with brushing up you model. – RKO Mar 08 '19 at 17:17
I see your using standard adam optimizer. Try to customize it and lower learn rate. Your result may indicate that you learning to fast. – RKO Mar 08 '19 at 17:33
It could also be that the network finds it easier to reduce the loss function by memorising the training data rather than generalising. That can come from issues with the data (where shuffling could help) or the model could be overfitting. If it is the later you may find the following useful https://www.tensorflow.org/tutorials/keras/overfit_and_underfit – Pedro Marques Mar 08 '19 at 23:08

score 1 · Answer 2 · answered Mar 08 '19 at 17:50

It seems to be the case where learning rate is too high, missing the local minimum and preventing the Neural Network to improve learning:

It would be good if you can customize your optimizer, like this:

learning_rate = 0.008
decay_rate = 5e-6
momentum = 0.65

sgd = SGD(lr=learning_rate,momentum=momentum, decay=decay_rate, nesterov=False)
model.compile(loss="categorical_crossentropy", optimizer=sgd,metrics=['accuracy'])

Also, increase the number of convolutions. Weights may be saturated.

val_loss is halved but val_acc stays constant

2 Answers2