The ELI5 library provides the function transform_feature_names
to retrieve the feature names for the output of an sklearn transformer. The documentation says that the function works out of the box when the transformer includes nested Pipelines.
I'm trying to get the function to work on a simplified version of the example in the answer to SO 57528350. My simplified example doesn't need Pipeline
, but in real life I will need it in order to add steps to categorical_transformer
, and I will also want to add transformers to the ColumnTransformer
.
import eli5
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
X_train = pd.DataFrame({'age': [23, 12, 12, 18],
'gender': ['M', 'F', 'F', 'F'],
'income': ['high', 'low', 'low', 'medium'],
'y': [0, 1, 1, 1]})
categorical_features = ['gender', 'income']
categorical_transformer = Pipeline(
steps=[('onehot', OneHotEncoder(handle_unknown='ignore'))])
transformers=[('categorical', categorical_transformer, categorical_features)]
preprocessor = ColumnTransformer(transformers)
X_train_transformed = preprocessor.fit(X_train)
eli5.transform_feature_names(preprocessor, list(X_train.columns))
This dies with the message
AttributeError: Transformer categorical (type Pipeline) does not provide get_feature_names.
Since the Pipeline
is nested in the ColumnTransformer
, I understood from the ELI5 documentation that it would be handled.
Do I need to create a modified version of Pipeline
with a get_feature_names
method or make other custom modifications in order to take advantage of the ELI5 function?
I'm using python 3.7.6, eli5 0.10.1, pandas 0.25.3, and sklearn 0.22.1.