How can I get features names when there is a preprocessor before feature selection?

Question

I tried checking some posts like this, this and this but I still couldn't find what I need.

These are the transformations I'm doing:

cat_transformer = Pipeline(steps=[("encoder", TargetEncoder())])

num_transformer = Pipeline(
    steps=[
        ("scaler", MinMaxScaler()),
        ("poly", PolynomialFeatures(2, interaction_only=True)),
    ]
)

transformer = ColumnTransformer(
    transformers=[
        ("cat", cat_transformer, cat_features),
        ("num", num_transformer, num_features),
    ],
    verbose_feature_names_out=False,
)

logit = LogisticRegression

model = Pipeline(
    steps=[
        ("preprocessor", transformer),
        ("feature_selection", SelectKBest(k=20)),
        ("logit", logit),
    ]
)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

Now, I want to get the 20 features selected.

I almost got there after doing:

model["feature_selection"].get_feature_names_out()

However, I got weird names like "x1", "x2", "x15" and so on.

I also tried:

model['preprocessor'].get_feature_names_out()

But that didn't work. Then I tried:

model['feature_selection'].get_support()

And got an array full of booleans (which I assume to be the features selected, but I don't know which feature is in each position). I also tried things like transformer['num'], but that didn't work (since it's a ColumnTransformer).

What can I do to get what features were selected for my model?

score 0 · Answer 1 · answered Nov 08 '22 at 14:39

Use model[:-1].get_feature_names_out().

The problem is that your preprocessor outputs a numpy array, so the feature selection step never sees feature names. But the pipeline's get_feature_names_out method steps the feature names forward through each transformer, so taking the pipeline excluding the logistic regression step (model[:-1]) and using its feature names method should work.

In version 1.2, you'll have the ability to specify that you want dataframes out of every transformer. If you turn that on, then the feature selection method will have feature names when its fitted, and so your first approach would work, as would model["logit"].feature_names_in_.

How can I get features names when there is a preprocessor before feature selection?

1 Answers1