0

I have GridsearchCV with a pipeline using decision tree as estimator

Now i want to plot the decision tree corresponding to the best_estimator of the GridsearchCV

There are some replys on stackoverflow but none consider a pipeline inside the GridsearchCV

from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeRegressor, plot_tree
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
import numpy as np


#Dummy data
X= [[1,2,3,5], [3,4,5,6], [6,7,8,9], [1,2,3,5], [3,4,5,6], [6,7,8,9]]
y= [50,70,80,2,5,6]

scr = StandardScaler()
dtree = DecisionTreeRegressor(random_state=100)

pipeline_tree = Pipeline([
    ('scaler', scr),
    ('regressor', dtree)
])

param_grid_tree = [{
    'regressor__max_depth': [2, 3],
    'regressor__min_samples_split': [2, 3],
}]
GridSearchCV_tree = GridSearchCV(estimator=pipeline_tree,
                                 param_grid=param_grid_tree, cv=2)


Dtree = GridSearchCV_tree.fit(X, y)


plot_tree(Dtree.best_estimator_, max_depth=5,
          impurity=True,
          feature_names=('X'),
          precision=1, filled=True)

I get

NotFittedError: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Any ideas?

desertnaut
  • 57,590
  • 26
  • 140
  • 166

2 Answers2

2

Since your estimators are Pipeline objects, the best_estimator_ attribute will return a pipeline as well. You have to further access the correct step with your regressor by indexing it, for example:

plot_tree(
    Dtree.best_estimator_['regressor'],  # <-- added indexing here
    max_depth=5,
    impurity=True,
    feature_names=['X1', 'X2', 'X3', 'X4'],   # changed this argument to make it work properly
    precision=1,
    filled=True
)

See the user guide on different methods to access pipeline steps.

In case you are wondering why your error message says the pipeline is not fitted, you can read more about it in my answer here.

afsharov
  • 4,774
  • 2
  • 10
  • 27
-1

I found the solution:

I have to use Dtree.best_estimator_['regressor'] instead of Dtree.best_estimator:

plot_tree(Dtree.best_estimator_, max_depth=5,
          impurity=True,
          feature_names=('X'),
          precision=1, filled=True)
desertnaut
  • 57,590
  • 26
  • 140
  • 166