I am trying to use Shapley summary_plot to inspect the top twenty features in my Random Forest classifier. First, I preprocessed my data with MaxAbsScaler and OneHotEncoder using make_pipeline and ColumnTransformer. When I plot the Shapley values I see that all of the feature names are numbers and I cannot figure out how to correctly identify the feature names. I have referenced: Sklearn Pipeline: Get feature names after OneHotEncode In ColumnTransformer but I see the error message below.
First I try the Shapley plot:
explainer = shap.TreeExplainer(rf)
shap_values = explainer.shap_values(X_test_scaled)
#convert X_test_scaled to DF so can call column names
X_test_scaled = pd.DataFrame(X_test_scaled)
X_test_scaled.head()
shap.summary_plot(shap_values, features=X_test_scaled, feature_names = X_test_scaled.columns)
produces: Shapley summary plot with numbered features
Then to get names, I tried from post:
preprocess['maxabsscaler'].transformers_[1][1]['onehotencoder']\
.get_feature_names(categorical_features)
which outputs:
TypeError Traceback (most recent call last)
in
2 #.get_feature_names(categorical_features
3
----> 4 preprocess['maxabsscaler'].transformers_1['onehotencoder']
5 .get_feature_names(categorical_features)
TypeError: 'ColumnTransformer' object is not subscriptable