I built a simple Stacking Classifier with mlxtend and am trying different base classifiers and I am facing an interesting situation. From all my research it seems to me that stacking classifiers always perform better than their base classifiers.
In my case, when I cross validate the stacking classifier on the training set, I get a lower score than some of the base estimators. In addition, I often get my stacking classifier average CV score equal to the lowest of the base estimators' average CV score.
Isn't this weird? Even more strangely, once I perform a GridSearchCV on my stacking classifier, select best parameters and retrain on the entire train set, and finally calculate accuracy on the test set, I actually get a pretty good score.
I know this method is prone to leakage and there are different techniques to CV the stacking classifier, but they seem to be extremely slow and from my research the above approach seems to be ok (about this potential leakage, this Kaggle Stacking guide post even says "In practice, everyone ignores this theoretical hole (and frankly I think most people are unaware it even exists!" http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/ See parameter tuning paragraph)
from mlxtend.classifier import StackingCVClassifier
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import accuracy_score
RANDOM_SEED = 12
#Imported df in separate code snippet
y = df['y']
X = df.drop(columns=['y'])
scaler = preprocessing.StandardScaler().fit(X)
X_transformed = scaler.transform(X)
X_train, X_test, y_train, y_test = train_test_split(X_transformed,y, random_state = 4)
def gridSearch_clf(clf, param_grid, X_train, y_train):
gs = GridSearchCV(clf, param_grid).fit(X_train, y_train)
print("Best Parameters")
print(gs.best_params_)
return gs.best_estimator_
def gs_report(y_test, X_test, best_estimator):
print(classification_report(y_test, best_estimator.predict(X_test)))
print("Overall Accuracy Score: ")
print(accuracy_score(y_test, best_estimator.predict(X_test)))
lr = LogisticRegression()
np.random.seed(RANDOM_SEED)
sclf = StackingCVClassifier(classifiers=[best_clf1, best_clf2, best_clf3],
meta_classifier=lr)
clfs = [best_clf1, best_clf2, best_clf3, sclf]
clf_names = [i.__class__.__name__ for i in clfs]
print_cv(clfs, clf_names)
Accuracy: 0.68 (+/- 0.30) [Decision Tree Classifier]
Accuracy: 0.55 (+/- 0.26) [K Neighbors Classifier]
Accuracy: 0.67 (+/- 0.32) [Bernoulli Naive Bayes]
Accuracy: 0.55 (+/- 0.26) [StackingClassifier]
## StackingClassifier Accuracy = KNN Classifier Accuracy
param_grid = {'meta-logisticregression__C':np.logspace(-2, 3, num=6, base=10)}
best_sclf = gridSearch_clf(sclf, param_grid, X_train, y_train)
gs_report(y_test,X_test, best_sclf)
Best Parameters
{'meta-logisticregression__C': 0.1}
precision recall f1-score support
0 0.91 0.99 0.95 9131
1 0.68 0.22 0.33 1166
avg / total 0.88 0.90 0.88 10297
Overall Accuracy Score:
0.9000679809653297