Big difference in accuracy after training the model and after loading that model

Question

I made Keras NN model for fake news detection. My features are avg length of the words, avg length of the sentence, number of punctuation signs, number of capital words, number of questions etc. I have 34 features. I have one output, 0 and 1 (0 for fake and 1 for real news). I have used 50000 samples for training, 10000 for testing and 2000 for validation. Values of my data are going from -1 to 10, so there is not big difference between values. I have used Standard Scaler like this:

x_train, x_test, y_train, y_test = train_test_split(features, results, test_size=0.20, random_state=0)

scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

validation_features = scaler.transform(validation_features)

My NN:

model = Sequential()
model.add(Dense(34, input_dim = x_train.shape[1], activation = 'relu')) # input layer requires input_dim param
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(150, activation = 'relu'))
model.add(Dense(1, activation='sigmoid')) # sigmoid instead of relu for final probability between 0 and 1

model.compile(loss="binary_crossentropy", optimizer= "adam", metrics=['accuracy'])

es = EarlyStopping(monitor='val_loss', min_delta=0.0, patience=0, verbose=0, mode='auto')
model.fit(x_train, y_train, epochs = 15, shuffle = True, batch_size=64, validation_data=(validation_features, validation_results), verbose=2, callbacks=[es])

scores = model.evaluate(x_test, y_test)
print(model.metrics_names[0], round(scores[0]*100,2), model.metrics_names[1], round(scores[1]*100,2))

Results:

Train on 50407 samples, validate on 2000 samples
Epoch 1/15
 - 3s - loss: 0.3293 - acc: 0.8587 - val_loss: 0.2826 - val_acc: 0.8725
Epoch 2/15
 - 1s - loss: 0.2647 - acc: 0.8807 - val_loss: 0.2629 - val_acc: 0.8745
Epoch 3/15
 - 1s - loss: 0.2459 - acc: 0.8885 - val_loss: 0.2602 - val_acc: 0.8825
Epoch 4/15
 - 1s - loss: 0.2375 - acc: 0.8930 - val_loss: 0.2524 - val_acc: 0.8870
Epoch 5/15
 - 1s - loss: 0.2291 - acc: 0.8960 - val_loss: 0.2423 - val_acc: 0.8905
Epoch 6/15
 - 1s - loss: 0.2229 - acc: 0.8976 - val_loss: 0.2495 - val_acc: 0.8870
12602/12602 [==============================] - 0s 21us/step
loss 23.95 acc 88.81

Accuracy check:

prediction = model.predict(validation_features , batch_size=64)

res = []
for p in prediction:
    res.append(p[0].round(0))

# Accuracy with sklearn
acc_score = accuracy_score(validation_results, res)
print("Sklearn acc", acc_score)  # 0.887

Saving the model:

model.save("new keras fake news acc 88.7.h5")
scaler_filename = "keras nn scaler.save"
joblib.dump(scaler, scaler_filename)

I have saved that model and that scaler. When I load that model and that scaler, and when I want to make prediction, I get accuracy of 52%, and thats very low because I had accuracy of 88.7% when I was training that model. I applied .transform on my new data for testing.

validation_df = pd.read_csv("validation.csv")
validation_features = validation_df.iloc[:,:-1]
validation_results = validation_df.iloc[:,-1].tolist()

scaler = joblib.load("keras nn scaler.save") 
validation_features = scaler.transform(validation_features)


my_model_1 = load_model("new keras fake news acc 88.7.h5")
prediction = my_model_1.predict(validation_features , batch_size=64)

res = []
for p in prediction:
    res.append(p[0].round(0))

# Accuracy with sklearn - much lower 
acc_score = accuracy_score(validation_results, res)
print("Sklearn acc", round(acc_score,2))  # 0.52

Can you tell me what am I doing wrong, I have read a lot about this on github and stackoverflow but I couldnt find the answer?

desertnaut · Answer 1 · 2020-04-02T11:54:20.187

It is difficult to answer that without your actual data. But there is a smoking gun, raising suspicions that your validation data might be (very) different from your training & test ones; and it comes from your previous question on this:

If i use fit_transform on my [validation set] features, I do not get an error, but I get accuracy of 52%, and that's terrible (because I had 89.1 %).

Although using fit_transform on the validation data is indeed wrong methodology (the correct one being what you do here), in practice, it should not lead to such a high discrepancy in the accuracy.

In other words, I have actually seen many cases where people erroneously apply such fit_transform approaches on their validation/deployment data, without never realizing any mistake in it, simply because they don't get any performance discrepancy - hence they are not alerted. And such a situation is expected, if indeed all these data are qualitatively similar.

But discrepancies such as yours here lead to strong suspicions that your validation data are actually (very) different from your training & test ones. If this is the case, such performance discrepancies are to be expected: the whole ML practice is founded upon the (often implicit) assumption that our data (training, validation, test, real-world deployment ones etc) do not change qualitatively, and they all come from the same statistical distribution.

So, the next step here is to perform an exploratory analysis to both your training & validation data to investigate this (actually, this is always assumed to be the step #0 in any predictive task). I guess that even elementary measures (mean & max/min values etc) will show if there are strong differences between them, as I suspect.

In particular, scikit-learn's StandardScaler uses

z = (x - u) / s

for the transformation, where u is the mean value and s the standard deviation of the data. If these values are significantly different between your training and validation sets, the performance discrepancy is not to be unexpected.

Ok so for example, this is model for fake news detection, if my training data have news about finance, economy, politics and I pass sports news as validation, ill get bad results? — taga, Apr 02 '20 at 11:47
@taga Not sure about this; but since you end up with numeric data, such investigation should bot be very difficult - see updated answer. — desertnaut, Apr 02 '20 at 11:54
I have head that I need to save weights also, and to load weights, is that true? I taught that everything is saved when I did model.save — taga, Apr 02 '20 at 21:50
@taga yes, that's the idea: https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model — desertnaut, Apr 02 '20 at 23:20
In model.fit there is parameter validation data, i put validation_data=(validation_features, validation_results), is that right thing to do, or I should put test data there? — taga, Apr 08 '20 at 22:55

Big difference in accuracy after training the model and after loading that model

1 Answers1

Linked