Keras CNN - loss continuously decreases but accuracy converges quickly

Question

No matter what optimizer, accuracy or loss metrics I use, my accuracy converges quickly (within 10-20 epochs) while my loss continues to decrease (>100 epochs). I've tried every optimizer available in Keras and the same trend occurs (although some converge less quickly and with slightly higher accuracy than others, with nAdam, Adadelta and Adamax performing the best).

My input is a 64x1 data vector and my output is a 3x1 vector representing 3D coordinates in real space. I have about 2000 training samples and 500 test samples. I've normalized both the input and outputs using MinMaxScaler from the scikit learn preprocessing toolbox, and I also shuffle my data using the scikit learn shuffle function. I use test_train_split to shuffle my data (with a specified random state). Here's my CNN:

def cnn(pretrained_weights = None,input_size = (64,1)):
    inputs = keras.engine.input_layer.Input(input_size)

    conv1 = Conv1D(64,2,strides=1,activation='relu')(inputs)
    conv2 = Conv1D(64,2,strides=1,activation='relu')(conv1)
    pool1 = MaxPooling1D(pool_size=2)(conv2)
    #pool1 = Dropout(0.25)(pool1)

    conv3 = Conv1D(128,2,strides=1,activation='relu')(pool1)
    conv4 = Conv1D(128,2,strides=1,activation='relu')(conv3)
    pool2 = MaxPooling1D(pool_size=2)(conv4)
    #pool2 = Dropout(0.25)(pool2)

    conv5 = Conv1D(256,2,strides=1,activation='relu')(pool2)
    conv6 = Conv1D(256,2,strides=1,activation='relu')(conv5)
    pool3 = MaxPooling1D(pool_size=2)(conv6)
    #pool3 = Dropout(0.25)(pool3)
    pool4 = MaxPooling1D(pool_size=2)(pool3)

    dense1 = Dense(256,activation='relu')(pool4)
    #drop1 = Dropout(0.5)(dense1)
    drop1 = dense1
    dense2 = Dense(64,activation='relu')(drop1)
    #drop2 = Dropout(0.5)(dense2)
    drop2 = dense2
    dense3 = Dense(32,activation='relu')(drop2)
    dense4 = Dense(1,activation='sigmoid')(dense3)

    model = Model(inputs = inputs, outputs = dense4)

    #opt = Adam(lr=1e-6,clipvalue=0.01)
    model.compile(optimizer = Nadam(lr=1e-4), loss = 'mse', metrics =   ['accuracy','mse','mae'])

I tried additional pooling (as can be seen in my code) to regularize my data and reduce overfitting (in case that's the problem) but to no avail. Here's a training example using the parameters above:

model = cnn()
model.fit(x=x_train, y=y_train, batch_size=7, epochs=10, verbose=1, validation_split=0.2, shuffle=True)

Train on 1946 samples, validate on 487 samples
Epoch 1/10
1946/1946 [==============================] - 5s 3ms/step - loss: 0.0932 - acc: 0.0766 - mean_squared_error: 0.0932 - mean_absolute_error: 0.2616 - val_loss: 0.0930 - val_acc: 0.0815 - val_mean_squared_error: 0.0930 - val_mean_absolute_error: 0.2605
Epoch 2/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0903 - acc: 0.0783 - mean_squared_error: 0.0903 - mean_absolute_error: 0.2553 - val_loss: 0.0899 - val_acc: 0.0842 - val_mean_squared_error: 0.0899 - val_mean_absolute_error: 0.2544
Epoch 3/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0886 - acc: 0.0807 - mean_squared_error: 0.0886 - mean_absolute_error: 0.2524 - val_loss: 0.0880 - val_acc: 0.0862 - val_mean_squared_error: 0.0880 - val_mean_absolute_error: 0.2529
Epoch 4/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0865 - acc: 0.0886 - mean_squared_error: 0.0865 - mean_absolute_error: 0.2488 - val_loss: 0.0875 - val_acc: 0.1081 - val_mean_squared_error: 0.0875 - val_mean_absolute_error: 0.2534
Epoch 5/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0849 - acc: 0.0925 - mean_squared_error: 0.0849 - mean_absolute_error: 0.2461 - val_loss: 0.0851 - val_acc: 0.0972 - val_mean_squared_error: 0.0851 - val_mean_absolute_error: 0.2427
Epoch 6/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0832 - acc: 0.1002 - mean_squared_error: 0.0832 - mean_absolute_error: 0.2435 - val_loss: 0.0817 - val_acc: 0.1075 - val_mean_squared_error: 0.0817 - val_mean_absolute_error: 0.2400
Epoch 7/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0819 - acc: 0.1041 - mean_squared_error: 0.0819 - mean_absolute_error: 0.2408 - val_loss: 0.0796 - val_acc: 0.1129 - val_mean_squared_error: 0.0796 - val_mean_absolute_error: 0.2374
Epoch 8/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0810 - acc: 0.1060 - mean_squared_error: 0.0810 - mean_absolute_error: 0.2391 - val_loss: 0.0787 - val_acc: 0.1129 - val_mean_squared_error: 0.0787 - val_mean_absolute_error: 0.2348
Epoch 9/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0794 - acc: 0.1089 - mean_squared_error: 0.0794 - mean_absolute_error: 0.2358 - val_loss: 0.0789 - val_acc: 0.1102 - val_mean_squared_error: 0.0789 - val_mean_absolute_error: 0.2337
Epoch 10/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0785 - acc: 0.1086 - mean_squared_error: 0.0785 - mean_absolute_error: 0.2343 - val_loss: 0.0767 - val_acc: 0.1143 - val_mean_squared_error: 0.0767 - val_mean_absolute_error: 0.2328

I'm having a hard time diagnosing what the problem is. Do I need to additional regularization? Here's an example of an input vector and corresponding ground truth:

input = array([[0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [5.05487319e-04],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [2.11865474e-03],
   [6.57073860e-04],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [8.02714614e-04],
   [1.09597877e-03],
   [5.37978732e-03],
   [9.74035809e-03],
   [0.00000000e+00],
   [0.00000000e+00],
   [2.04473307e-03],
   [5.60562907e-04],
   [1.76158615e-03],
   [3.48869003e-03],
   [6.45111735e-02],
   [7.75741303e-01],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [1.33064182e-02],
   [5.04751340e-02],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [5.90069050e-04],
   [3.27240480e-03],
   [1.92582590e-03],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [0.00000000e+00],
   [4.50609885e-04],
   [1.12957157e-03],
   [1.24890352e-03]])

 output = array([[0.        ],
   [0.41666667],
   [0.58823529]])

Could it have to do with how the data is normalized or the nature of my data? Do I just not have enough data? Any insight is appreciated, I've tried advice from many other posts but nothing has worked yet. Thanks!

Cannot see your accuracy (training or validation) "converging quickly" - both go from 0.07 to ~ 0.1; but this is irrelevant, because you are in a regression setting, where **accuracy is meaningless**. See the discussion in [What function defines accuracy in Keras when the loss is mean squared error (MSE)?](https://stackoverflow.com/questions/48775305/what-function-defines-accuracy-in-keras-when-the-loss-is-mean-squared-error-mse/48788577#48788577) — desertnaut, Jan 08 '19 at 16:51
Ahh I see, I've only ever previously performed classification with Keras so I didn't realize the accuracy metric didn't translate to regression problems, appreciate the help! Is the "mode" I'm in (in this case, regression) only dictated by the type of activation I use in the output layer? — A. LaBella, Jan 08 '19 at 18:27

score 1 · Accepted Answer · answered Jan 08 '19 at 18:48

There are several issues with your question...

To start with, both your training & validation accuracies certainly do not "converge quickly", as you claim (both go from 0.07 to ~ 0.1); but even if this was the case, I fail to see how this would be a problem (usually people complain for the oppposite, i.e. accuracy not converging, or not converging quickly enough).

But all this discussion is irrelevant, simply because you are in a regression setting, where accuracy is meaningless; truth is, in such a case, Keras will not "protect" you with a warning or something. You may find the discussion in What function defines accuracy in Keras when the loss is mean squared error (MSE)? useful (disclaimer: the answer is mine).

So, you should change the model.compile statement as follows:

model.compile(optimizer = Nadam(lr=1e-4), loss = 'mse')

i.e. there is no need for metrics here (measuring both mse and mae sounds like an overkill - I suggest to use only one of them).

Is the "mode" I'm in (in this case, regression) only dictated by the type of activation I use in the output layer?

No. The "mode" (regression or classification) is determined by your loss function: losses like mse and mae imply regression settings.

Which brings us to the last issue: unless you know that your outputs take values only in [0, 1], you should not use sigmoid as the activation function of your last layer; a linear activation is normally used for regression settings, i.e.:

dense4 = Dense(1,activation='linear')(dense3)

which, as the linear activation is the default one in Keras (docs), is not even needed explicitly, i.e.:

dense4 = Dense(1)(dense3)

will do the job as well.

Keras CNN - loss continuously decreases but accuracy converges quickly

1 Answers1