Why is my VotingClassifier accuracy less than my individual classifier?

Question

I am trying to create an ensemble of three classifiers (Random Forest, Support Vector Machine and XGBoost) using the VotingClassifier() in scikit-learn. However, I find that the accuracy of the ensemble actually decreases instead of increasing. I can't figure out why.

Here is the code:

from sklearn.ensemble import VotingClassifier

eclf = VotingClassifier(estimators=[('rf', rf_optimized), ('svc', svc_optimized), ('xgb', xgb_optimized)], 
                        voting='soft', weights=[1,1,2])

for clf, label in zip([rf, svc_optimized, xgb_optimized, eclf], ['Random Forest', 'Support Vector Machine', 'XGBoost', 'Ensemble']):
    scores = cross_val_score(clf, X, y, cv=10, scoring='accuracy')
    print("Accuracy: %0.3f (+/- %0.3f) [%s]" % (scores.mean(), scores.std(), label))

The XGBoost has the highest accuracy so I even tried giving it more weightage to no avail.

What could I be doing wrong?

score 10 · Answer 1 · answered Sep 27 '16 at 20:19

VotingClassifiers are not always guaranteed to have better performance, especially when using soft voting if you have poorly calibrated base models.

For a contrived example, say all of the models are really wrong when they are wrong (say give a probability of .99 for the incorrect class) but are only slightly right when they are right (say give a probability of .51 for the correct class). Furthermore, say 'rf' and 'svc' are always right when 'xgb' is wrong and vice versa and each classifier has an accuracy of 50% on its own.

The voting classifier that you implement would have an accuracy of 0% since you are using soft voting. Here is why:

Case 1: 'xgb' right. Then it gives a probability of .51 to the correct class and gets a weight of 2, for a score of 1.02. However, the other models each give a probability of .99 for the incorrect class for a score of 1.98. That class gets chosen by your voting classifier.
Case 2: 'xgb' is wrong. Then it gives a probability of .99 to the incorrect class with a weight of 2 for a score of 1.98. The other two models give a combined score of 1.02 for the correct class. Again, the wrong class is chosen by your classifier.

Why is my VotingClassifier accuracy less than my individual classifier?

1 Answers1