32

I am attempting to tune an AdaBoost Classifier ("ABT") using a DecisionTreeClassifier ("DTC") as the base_estimator. I would like to tune both ABT and DTC parameters simultaneously, but am not sure how to accomplish this - pipeline shouldn't work, as I am not "piping" the output of DTC to ABT. The idea would be to iterate hyper parameters for ABT and DTC in the GridSearchCV estimator.

How can I specify the tuning parameters correctly?

I tried the following, which generated an error below.

[IN]
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.grid_search import GridSearchCV

param_grid = {dtc__criterion : ["gini", "entropy"],
              dtc__splitter :   ["best", "random"],
              abc__n_estimators: [none, 1, 2]
             }


DTC = DecisionTreeClassifier(random_state = 11, max_features = "auto", class_weight = "auto",max_depth = None)

ABC = AdaBoostClassifier(base_estimator = DTC)

# run grid search
grid_search_ABC = GridSearchCV(ABC, param_grid=param_grid, scoring = 'roc_auc')

[OUT]
ValueError: Invalid parameter dtc for estimator AdaBoostClassifier(algorithm='SAMME.R',
      base_estimator=DecisionTreeClassifier(class_weight='auto', criterion='gini', max_depth=None,
        max_features='auto', max_leaf_nodes=None, min_samples_leaf=1,
        min_samples_split=2, min_weight_fraction_leaf=0.0,
        random_state=11, splitter='best'),
      learning_rate=1.0, n_estimators=50, random_state=11)
GPB
  • 2,395
  • 8
  • 26
  • 36

2 Answers2

43

There are several things wrong in the code you posted:

  1. The keys of the param_grid dictionary need to be strings. You should be getting a NameError.
  2. The key "abc__n_estimators" should just be "n_estimators": you are probably mixing this with the pipeline syntax. Here nothing tells Python that the string "abc" represents your AdaBoostClassifier.
  3. None (and not none) is not a valid value for n_estimators. The default value (probably what you meant) is 50.

Here's the code with these fixes. To set the parameters of your Tree estimator you can use the "__" syntax that allows accessing nested parameters.

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.grid_search import GridSearchCV

param_grid = {"base_estimator__criterion" : ["gini", "entropy"],
              "base_estimator__splitter" :   ["best", "random"],
              "n_estimators": [1, 2]
             }


DTC = DecisionTreeClassifier(random_state = 11, max_features = "auto", class_weight = "auto",max_depth = None)

ABC = AdaBoostClassifier(base_estimator = DTC)

# run grid search
grid_search_ABC = GridSearchCV(ABC, param_grid=param_grid, scoring = 'roc_auc')

Also, 1 or 2 estimators does not really make sense for AdaBoost. But I'm guessing this is not the actual code you're running.

Hope this helps.

ldirer
  • 6,606
  • 3
  • 24
  • 30
  • Points 1. and 3. were transcription errors, my bad! I will try your suggestion for syntax specifying param_grid and report back. But if I understand it correctly, I can use the "__" expression in multiple contexts? I understand your point about the number of estimators, I was coding this first to see if it worked...More soon. – GPB Aug 25 '15 at 22:06
  • @GPB The "__" syntax is also used to specify parameters for (possibly nested) objects in Pipelines. – ldirer Aug 25 '15 at 22:09
  • 1
    @ldirer Can you please explain how your code is tuning the parameters of AdaBoost? Don't we need to do a second grid-search with a parameter grid for the AdaBoost classifier? – user2738815 Sep 07 '17 at 05:15
  • When we use another classifier as a base estimator for AdaBoost, do we not use AdaBoost's own `n_estimators` and `learning_rate?` params? – Edison Jul 06 '22 at 00:48
  • @Edison I wrote this a long time ago but I'll hazard an answer: we do use `n_estimators` (and `learning_rate`) from AdaBoost. All parameters in the grid search that don't start with `base_estimator__` are Adaboost's, and the others are 'forwarded' to the object we pass as `base_estimator` argument (`DTC` in the sample). Side note: AdaBoost *always uses another classifier as a base estimator*: it's a 'meta classifier' that works by fitting several version of the 'base estimator' to produce an ensemble of estimators. – ldirer Jul 06 '22 at 08:55
  • @ldirer I posted a [question](https://stackoverflow.com/questions/72893925/adaboost-grid-search-params-cv-metrics-using-decisiontree-as-base-estimator) on this topic. Even if you don't answer it, feel free to comment. Cheers. – Edison Jul 07 '22 at 15:22
5

Trying to provide a shorter (and hopefully generic) answer.


If you want to grid search within a BaseEstimator for the AdaBoostClassifier e.g. varying the max_depth or min_sample_leaf of a DecisionTreeClassifier estimator, then you have to use a special syntax in the parameter grid.

abc = AdaBoostClassifier(base_estimator=DecisionTreeClassifier())

parameters = {'base_estimator__max_depth':[i for i in range(2,11,2)],
              'base_estimator__min_samples_leaf':[5,10],
              'n_estimators':[10,50,250,1000],
              'learning_rate':[0.01,0.1]}

clf = GridSearchCV(abc, parameters,verbose=3,scoring='f1',n_jobs=-1)
clf.fit(X_train,y_train)

So, note the 'base_estimator__max_depth' and 'base_estimator__min_samples_leaf' keys in the parameters dictionary. That's the way to access the hyperparameters of a BaseEstimator for an ensemble algorithm like AdaBoostClassifier when you are doing a grid search. Note the __ double underscore notation in particular. Other two keys in the parameters are the regular AdaBoostClassifier parameters.

Tirtha
  • 598
  • 5
  • 9