1

I am trying to use Shapley summary_plot to inspect the top twenty features in my Random Forest classifier. First, I preprocessed my data with MaxAbsScaler and OneHotEncoder using make_pipeline and ColumnTransformer. When I plot the Shapley values I see that all of the feature names are numbers and I cannot figure out how to correctly identify the feature names. I have referenced: Sklearn Pipeline: Get feature names after OneHotEncode In ColumnTransformer but I see the error message below.

First I try the Shapley plot:

explainer = shap.TreeExplainer(rf)
shap_values = explainer.shap_values(X_test_scaled)

#convert X_test_scaled to DF so can call column names
X_test_scaled = pd.DataFrame(X_test_scaled)
X_test_scaled.head()
shap.summary_plot(shap_values, features=X_test_scaled, feature_names = X_test_scaled.columns)

produces: Shapley summary plot with numbered features

Then to get names, I tried from post:

preprocess['maxabsscaler'].transformers_[1][1]['onehotencoder']\
                   .get_feature_names(categorical_features)

which outputs:

TypeError Traceback (most recent call last) in 2 #.get_feature_names(categorical_features 3 ----> 4 preprocess['maxabsscaler'].transformers_1['onehotencoder']
5 .get_feature_names(categorical_features)

TypeError: 'ColumnTransformer' object is not subscriptable

  • 1
    Can you simplify this example a bit and add a [minimal reproducible example?](https://stackoverflow.com/help/minimal-reproducible-example) `MaxAbsScaler` should not alter the number of features, but `OneHotEncoder` will introduce a new feature for each unique value in a column: do you have a way you want to name these? (e.g. `blood_pressure_category_1`, `blood_pressure_category2`, etc.) – Alexander L. Hayes Apr 22 '21 at 19:14

0 Answers0