I want to select top K features using SelectKBest
and run GaussianNB
:
selection = SelectKBest(mutual_info_classif, k=300)
data_transformed = selection.fit_transform(data, labels)
new_data_transformed = selection.transform(new_data)
classifier = GaussianNB()
classifier.fit(data_transformed, labels)
y_predicted = classifier.predict(new_data)
acc = accuracy_score(new_data_labels, y_predicted)
However, I do not get consistent results for accuracy on the same data. The accuracy has been:
0.61063743402354853
0.60678034916768164
0.61733658140479086
0.61652456354039786
0.64778725131952908
0.58384084449857898
For the SAME data. I don't do splits etc. I just use two static sets of data
and new_data
.
Why do the results vary? How do I make sure I get the same accuracy for the same data?