No matter what optimizer, accuracy or loss metrics I use, my accuracy converges quickly (within 10-20 epochs) while my loss continues to decrease (>100 epochs). I've tried every optimizer available in Keras and the same trend occurs (although some converge less quickly and with slightly higher accuracy than others, with nAdam, Adadelta and Adamax performing the best).
My input is a 64x1 data vector and my output is a 3x1 vector representing 3D coordinates in real space. I have about 2000 training samples and 500 test samples. I've normalized both the input and outputs using MinMaxScaler from the scikit learn preprocessing toolbox, and I also shuffle my data using the scikit learn shuffle function. I use test_train_split to shuffle my data (with a specified random state). Here's my CNN:
def cnn(pretrained_weights = None,input_size = (64,1)):
inputs = keras.engine.input_layer.Input(input_size)
conv1 = Conv1D(64,2,strides=1,activation='relu')(inputs)
conv2 = Conv1D(64,2,strides=1,activation='relu')(conv1)
pool1 = MaxPooling1D(pool_size=2)(conv2)
#pool1 = Dropout(0.25)(pool1)
conv3 = Conv1D(128,2,strides=1,activation='relu')(pool1)
conv4 = Conv1D(128,2,strides=1,activation='relu')(conv3)
pool2 = MaxPooling1D(pool_size=2)(conv4)
#pool2 = Dropout(0.25)(pool2)
conv5 = Conv1D(256,2,strides=1,activation='relu')(pool2)
conv6 = Conv1D(256,2,strides=1,activation='relu')(conv5)
pool3 = MaxPooling1D(pool_size=2)(conv6)
#pool3 = Dropout(0.25)(pool3)
pool4 = MaxPooling1D(pool_size=2)(pool3)
dense1 = Dense(256,activation='relu')(pool4)
#drop1 = Dropout(0.5)(dense1)
drop1 = dense1
dense2 = Dense(64,activation='relu')(drop1)
#drop2 = Dropout(0.5)(dense2)
drop2 = dense2
dense3 = Dense(32,activation='relu')(drop2)
dense4 = Dense(1,activation='sigmoid')(dense3)
model = Model(inputs = inputs, outputs = dense4)
#opt = Adam(lr=1e-6,clipvalue=0.01)
model.compile(optimizer = Nadam(lr=1e-4), loss = 'mse', metrics = ['accuracy','mse','mae'])
I tried additional pooling (as can be seen in my code) to regularize my data and reduce overfitting (in case that's the problem) but to no avail. Here's a training example using the parameters above:
model = cnn()
model.fit(x=x_train, y=y_train, batch_size=7, epochs=10, verbose=1, validation_split=0.2, shuffle=True)
Train on 1946 samples, validate on 487 samples
Epoch 1/10
1946/1946 [==============================] - 5s 3ms/step - loss: 0.0932 - acc: 0.0766 - mean_squared_error: 0.0932 - mean_absolute_error: 0.2616 - val_loss: 0.0930 - val_acc: 0.0815 - val_mean_squared_error: 0.0930 - val_mean_absolute_error: 0.2605
Epoch 2/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0903 - acc: 0.0783 - mean_squared_error: 0.0903 - mean_absolute_error: 0.2553 - val_loss: 0.0899 - val_acc: 0.0842 - val_mean_squared_error: 0.0899 - val_mean_absolute_error: 0.2544
Epoch 3/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0886 - acc: 0.0807 - mean_squared_error: 0.0886 - mean_absolute_error: 0.2524 - val_loss: 0.0880 - val_acc: 0.0862 - val_mean_squared_error: 0.0880 - val_mean_absolute_error: 0.2529
Epoch 4/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0865 - acc: 0.0886 - mean_squared_error: 0.0865 - mean_absolute_error: 0.2488 - val_loss: 0.0875 - val_acc: 0.1081 - val_mean_squared_error: 0.0875 - val_mean_absolute_error: 0.2534
Epoch 5/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0849 - acc: 0.0925 - mean_squared_error: 0.0849 - mean_absolute_error: 0.2461 - val_loss: 0.0851 - val_acc: 0.0972 - val_mean_squared_error: 0.0851 - val_mean_absolute_error: 0.2427
Epoch 6/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0832 - acc: 0.1002 - mean_squared_error: 0.0832 - mean_absolute_error: 0.2435 - val_loss: 0.0817 - val_acc: 0.1075 - val_mean_squared_error: 0.0817 - val_mean_absolute_error: 0.2400
Epoch 7/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0819 - acc: 0.1041 - mean_squared_error: 0.0819 - mean_absolute_error: 0.2408 - val_loss: 0.0796 - val_acc: 0.1129 - val_mean_squared_error: 0.0796 - val_mean_absolute_error: 0.2374
Epoch 8/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0810 - acc: 0.1060 - mean_squared_error: 0.0810 - mean_absolute_error: 0.2391 - val_loss: 0.0787 - val_acc: 0.1129 - val_mean_squared_error: 0.0787 - val_mean_absolute_error: 0.2348
Epoch 9/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0794 - acc: 0.1089 - mean_squared_error: 0.0794 - mean_absolute_error: 0.2358 - val_loss: 0.0789 - val_acc: 0.1102 - val_mean_squared_error: 0.0789 - val_mean_absolute_error: 0.2337
Epoch 10/10
1946/1946 [==============================] - 2s 1ms/step - loss: 0.0785 - acc: 0.1086 - mean_squared_error: 0.0785 - mean_absolute_error: 0.2343 - val_loss: 0.0767 - val_acc: 0.1143 - val_mean_squared_error: 0.0767 - val_mean_absolute_error: 0.2328
I'm having a hard time diagnosing what the problem is. Do I need to additional regularization? Here's an example of an input vector and corresponding ground truth:
input = array([[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[5.05487319e-04],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[2.11865474e-03],
[6.57073860e-04],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[8.02714614e-04],
[1.09597877e-03],
[5.37978732e-03],
[9.74035809e-03],
[0.00000000e+00],
[0.00000000e+00],
[2.04473307e-03],
[5.60562907e-04],
[1.76158615e-03],
[3.48869003e-03],
[6.45111735e-02],
[7.75741303e-01],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[1.33064182e-02],
[5.04751340e-02],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[5.90069050e-04],
[3.27240480e-03],
[1.92582590e-03],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[0.00000000e+00],
[4.50609885e-04],
[1.12957157e-03],
[1.24890352e-03]])
output = array([[0. ],
[0.41666667],
[0.58823529]])
Could it have to do with how the data is normalized or the nature of my data? Do I just not have enough data? Any insight is appreciated, I've tried advice from many other posts but nothing has worked yet. Thanks!