1

I have adapted a scikit-learn example to fit my needs.

What it does is preprocess the columns according to their type: numerical data is scaled, but other data is transformed with OneHotEncoder.

Then, the pipeline joins the columns through a ColumnTransformer, and then feeds into a LogisticRegression.

Is there an easy way to recover the feature names that correspond to the coefficients of the LogisticRegression right before the end of the pipeline?

Or is manually keeping track of features the best idea? How would I go about that? My dataset has dozens of features, and after one-hot encoding, the linear model gets thousands.

I was able to get the OneHotEncoder's features like this: clf.steps[0][1].transformers_[1][1].steps[1][1].get_feature_names()

However, after concatenating them with the numeric features, I could not match them with coefficients in the LogisticRegression: clf.steps[1][1].coef_.

danuker
  • 861
  • 10
  • 26
  • @Venkatachalam Yes it does. Thank you for linking it, I did not find it in a search: `[scikit-learn] feature names ColumnTransformer OneHotEncoder pipeline`, it seems SO search indexer does not split tokens by underscore. – danuker Jan 02 '20 at 11:22

0 Answers0