sklearn "Pipeline instance is not fitted yet." error, even though it is

Question

A similar question is already asked, but the answer did not help me solve my problem: Sklearn components in pipeline is not fitted even if the whole pipeline is?

I'm trying to use multiple pipelines to preprocess my data with a One Hot Encoder for categorical and numerical data (as suggested in this blog).

Here is my code, and even though my classifier produces 78% accuracy, I can't figure out why I cannot plot the decision-tree I'm training and what can help me fix the problem. Here is the code snippet:

import pandas as pd
import sklearn.tree as tree
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder  
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer


X = pd.DataFrame(data=data)  
Y = pd.DataFrame(data=prediction)

categoricalFeatures = ["race", "gender"]
numericalFeatures = ["age", "number_of_actions"]

categoricalTransformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(handle_unknown='ignore')),
])

numericTransformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler()),
])

preprocessor = ColumnTransformer(transformers=[
    ('num', numericTransformer, numericalFeatures),
    ('cat', categoricalTransformer, categoricalFeatures)
])

classifier = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', tree.DecisionTreeClassifier(max_depth=3))
])

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=12, stratify=Y)

classifier.fit(X_train, y_train)
print("model score: %.3f" % classifier.score(X_test, y_test))  # Prints accuracy of 0.78

text_representation = tree.export_text(classifier)

The last command produces this error, in spite of the model being fitted (I assume it's a synchronization situation but can't figure out how to solve it):

sklearn.exceptions.NotFittedError: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Error aside, you don't need a pipeline for the `categoricalTransformer`. You can use the OneHotEncoder directly like this: `categoricalTransformer = OneHotEncoder(handle_unknown='ignore')` — Aelius, Feb 02 '23 at 10:45

afsharov · Accepted Answer · 2021-06-11T22:09:48.910

2

You cannot use the export_text function on the whole pipeline as it only accepts Decision Tree objects, i.e. DecisionTreeClassifier or DecisionTreeRegressor. Only pass the fitted estimator of your pipeline and it will work:

text_representation = tree.export_text(classifier['classifier'])

The error message stating that the Pipeline object is not fitted is due to the check_is_fitted function of scikit-learn. It works by checking the presence of fitted attributes (ending with a trailing underscore) on the estimator. Since Pipeline objects do not expose such attributes, the check fails and raises the error, although it is indeed fitted. But that is not a problem since Pipeline objects are not meant to be used that way anyway.

edited Jun 11 '21 at 22:09

answered Jun 11 '21 at 22:03

afsharov

4,774
2
10
27

This is amazing! Thanks so much for your quick and complete response, @afsharov ! Follow-up question: is there a way to actually get the new list of feature names using this pipeline? (based on the one hot encoder in the pipeline?) – Mahsan Nourani Jun 11 '21 at 22:19
Never mind! Found a solution [here](https://stackoverflow.com/questions/54646709/sklearn-pipeline-get-feature-names-after-onehotencode-in-columntransformer) that in my case, will work as below: `classifier['preprocessor'].transformers_[1][1]['onehot']\ .get_feature_names(categoricalFeatures)` – Mahsan Nourani Jun 11 '21 at 22:41

sklearn "Pipeline instance is not fitted yet." error, even though it is

1 Answers1

Linked

Related