A similar question is already asked, but the answer did not help me solve my problem: Sklearn components in pipeline is not fitted even if the whole pipeline is?
I'm trying to use multiple pipelines to preprocess my data with a One Hot Encoder for categorical and numerical data (as suggested in this blog).
Here is my code, and even though my classifier produces 78% accuracy, I can't figure out why I cannot plot the decision-tree I'm training and what can help me fix the problem. Here is the code snippet:
import pandas as pd
import sklearn.tree as tree
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
X = pd.DataFrame(data=data)
Y = pd.DataFrame(data=prediction)
categoricalFeatures = ["race", "gender"]
numericalFeatures = ["age", "number_of_actions"]
categoricalTransformer = Pipeline(steps=[
('onehot', OneHotEncoder(handle_unknown='ignore')),
])
numericTransformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler()),
])
preprocessor = ColumnTransformer(transformers=[
('num', numericTransformer, numericalFeatures),
('cat', categoricalTransformer, categoricalFeatures)
])
classifier = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', tree.DecisionTreeClassifier(max_depth=3))
])
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=12, stratify=Y)
classifier.fit(X_train, y_train)
print("model score: %.3f" % classifier.score(X_test, y_test)) # Prints accuracy of 0.78
text_representation = tree.export_text(classifier)
The last command produces this error, in spite of the model being fitted (I assume it's a synchronization situation but can't figure out how to solve it):
sklearn.exceptions.NotFittedError: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.