0

I'm trying to build a Decision Tree using gridsearch and a pipeline, but I get an error when I try to export the image using graphviz. I looked online and couldn't find anything; one potential problem would've been if I didn't use the best_estimator_ instance, but I did in this case.

Everything works (getting accuracy and other metrics) except the exporting graph part.

def TreeOpt(X, y):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

    std_scl = StandardScaler()
    dec_tree = tree.DecisionTreeClassifier()
    pipe = Pipeline(steps=[('std_slc', std_scl),
                           ('dec_tree', dec_tree)])

    criterion = ['gini', 'entropy']
    max_depth = list(range(1,15))

    parameters = dict(dec_tree__criterion=criterion,
                      dec_tree__max_depth=max_depth)
                

    tree_gs = GridSearchCV(pipe, parameters)

    tree_gs.fit(X_train, y_train)

    export_graphviz( 
        tree_gs.best_estimator_,
        out_file=("dec_tree.dot"),
        feature_names=None,
        class_names=None,
        filled=True)

But I get

<ipython-input-2-bb91ec6ba0d9> in <module>
     37         filled=True)
     38 
---> 39 DecTreeOptimizer(X = df.drop(['quality'], axis=1), y = df.quality)
     40 

<ipython-input-2-bb91ec6ba0d9> in DecTreeOptimizer(X, y)
     30     print("Best score: " +  str(tree_GS.best_score_))
     31 
---> 32     export_graphviz( 
     33         tree_GS.best_estimator_,
     34         out_file=("dec_tree.dot"),

~\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\tree\_export.py in export_graphviz(decision_tree, out_file, max_depth, feature_names, class_names, label, filled, leaves_parallel, impurity, node_ids, proportion, rotate, rounded, special_characters, precision)
    767     """
    768 
--> 769     check_is_fitted(decision_tree)
    770     own_file = False
    771     return_string = False

~\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\utils\validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
   1096 
   1097     if not attrs:
-> 1098         raise NotFittedError(msg % {'name': type(estimator).__name__})
   1099 
   1100 

NotFittedError: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.```

IcarusX
  • 13
  • 3

1 Answers1

0

After long searches, finally found the answer here :Plot best decision tree with pipeline and GridsearchCV

The best_estimator_ attribute returns a pipeline instead of an object, so I just had to query it like this: best_estimator_[1] (and then I found a whole other lot of problems with my code, but that's part 2).

I will leave this here in case anyone else is going to need it. Cheers!

IcarusX
  • 13
  • 3