does the 'init' parameter of scikit-learn GradientBoostingRegressor define the base estimator?

Question

I'm trying to create an ensemble of an determined regressor, with this in mind i've searched for some way to use the sklearn already existing ensemble methods, and try to change the base estimator of the ensemble. the bagging documentation is clear because it says that you can change the base estimator by passing your regressor as parameter to "base_estimator", but with GradientBoosting you can pass a regressor in the "init" parameter.

My question is: passing my regressor in the init parameter of the GradientBoosting, will make it use the regressor i've specified as base estimator instead of trees? the documentation says that the init value must be "An estimator object that is used to compute the initial predictions", so i dont know if the estimator i'll pass in init will be the one used in fact as the weak learner to be enhanced by the bosting method, or it will just be used at the beginning and after that all the work is done by decision trees.

desertnaut · Accepted Answer · 2020-09-19T13:21:27.640

No.

GradientBoostingRegressor can only use regressor trees as base estimators; from the docs (emphasis mine):

In each stage a regression tree is fit

And as pointed out in a relevant Github thread (HT to Ben Reiniger for pointing this out in the comment below):

the implementation is entirely tied to the assumption that the base estimators are trees

In order to boost arbitraty base regressors (similar to bagging), you need AdaBoostRegressor, which, similarly again with bagging, takes also a base_estimator argument. But before doing so, you may want to have a look at own answer in Execution time of AdaBoost with SVM base classifier; quoting:

Adaboost (and similar ensemble methods) were conceived using decision trees as base classifiers (more specifically, decision stumps, i.e. DTs with a depth of only 1); there is good reason why still today, if you don't specify explicitly the base_classifier argument, it assumes a value of DecisionTreeClassifier(max_depth=1). DTs are suitable for such ensembling because they are essentially unstable classifiers, which is not the case with SVMs, hence the latter are not expected to offer much when used as base classifiers.

see also https://github.com/scikit-learn/scikit-learn/issues/17660 — Ben Reiniger, Sep 15 '20 at 14:15

does the 'init' parameter of scikit-learn GradientBoostingRegressor define the base estimator?

1 Answers1