9

I want to use StackingClassifier to combine some classifiers and then use GridSearchCV to optimize the parameters:

clf1 = RandomForestClassifier()
clf2 = LogisticRegression()
dt = DecisionTreeClassifier()
sclf = StackingClassifier(estimators=[clf1, clf2],final_estimator=dt)

params = {'randomforestclassifier__n_estimators': [10, 50],
          'logisticregression__C': [1,2,3]}

grid = GridSearchCV(estimator=sclf, param_grid=params, cv=5)

grid.fit(x, y)

But this turns out an error:

'RandomForestClassifier' object has no attribute 'estimators_'

I have used n_estimators. Why it warns me that no estimators_?

Usually GridSearchCV is applied to single model so I just need to write the name of parameters of the single model in a dict.

I refer to this page https://groups.google.com/d/topic/mlxtend/5GhZNwgmtSg but it uses parameters of early version. Even though I change the newly parameters it doesn't work.

Btw, where can I learn the details of the naming rule of these params?

seralouk
  • 30,938
  • 9
  • 118
  • 133
xiaoluohao
  • 265
  • 2
  • 11
  • if you execute `sclf.fit(X,y)` what do you get? – seralouk May 10 '20 at 12:44
  • 'RandomForestClassifier' object has no attribute 'estimators_' – xiaoluohao May 10 '20 at 12:44
  • see my answer. hope it helps – seralouk May 10 '20 at 13:03
  • Note that `estimators_` are fitted on the full X while `final_estimator_` is trained using cross-validated predictions of the base estimators using `cross_val_predict` [from here](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingClassifier.html). I guess that the grid search for the base estimators is performed within each fold and then the best predictors give the predictions for the training of `final_estimator_`, but... how is performed the grid search for this one? – makpalan Nov 02 '21 at 21:08

2 Answers2

11

First of all, the estimators need to be a list containing the models in tuples with the corresponding assigned names.

estimators = [('model1', model()), # model() named model1 by myself
              ('model2', model2())] # model2() named model2 by myself

Next, you need to use the names as they appear in sclf.get_params(). Also, the name is the same as the one you gave to the specific model in the bove estimators list. So, here for model1 parameters you need:

params = {'model1__n_estimators': [5,10]} # model1__SOME_PARAM 

Working toy example:

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.model_selection import GridSearchCV


X, y = make_classification(n_samples=1000, n_features=4, 
                            n_informative=2, n_redundant=0,
                            random_state=0, shuffle=False)


estimators = [('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
              ('logreg', LogisticRegression())]

sclf = StackingClassifier(estimators= estimators , final_estimator=DecisionTreeClassifier())

params = {'rf__n_estimators': [5,10]}

grid = GridSearchCV(estimator=sclf, param_grid=params, cv=5)
grid.fit(X, y)
seralouk
  • 30,938
  • 9
  • 118
  • 133
3

After some trial, maybe I find an available solution.

The key to solve this problem is to use get_params() to know the parameters of StackingClassifier.

I use another way to create sclf:

clf1 = RandomForestClassifier()
clf2 = LogisticRegression()
dt = DecisionTreeClassifier()
estimators = [('rf', clf1),
              ('lr', clf2)]
sclf = StackingClassifier(estimators=estimators,final_estimator=dt)
params = {'rf__n_estimators': list(range(100,1000,100)),
          'lr__C': list(range(1,10,1))}
grid = GridSearchCV(estimator=sclf, param_grid=params,verbose=2, cv=5,n_jobs=-1)
grid.fit(x, y)

In this way, I can name every basic classifiers and then set the params with their names.

xiaoluohao
  • 265
  • 2
  • 11