3

I am using shap library for ML interpretability to better understand k-means segmentation algorithm clusters. In a nutshell I make some blogs, use k-means to cluster them and then take the clusters as label and xgboost to try to predict them. I have 5 clusters so it is a signle-label multi-class classification problem.

import numpy as np
from sklearn.datasets import make_blobs
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans 
import xgboost as xgb
import shap

X, y = make_blobs(n_samples=500, centers=5, n_features=5, random_state=0)
data = pd.DataFrame(np.concatenate((X, y.reshape(500,1)), axis=1), columns=['var_1', 'var_2', 'var_3', 'var_4', 'var_5', 'cluster_id'])
data['cluster_id'] = data['cluster_id'].astype(int).astype(str)
scaler = StandardScaler()
scaled_features = scaler.fit_transform(data.iloc[:,:-1])
kmeans = KMeans(n_clusters=5, **kmeans_kwargs)
kmeans.fit(scaled_features)
data['predicted_cluster_id'] = kmeans.labels_.astype(int).astype(str)
clf = xgb.XGBClassifier()
clf.fit(scaled_data.iloc[:,:-1], scaled_data['predicted_cluster_id'])
shap.initjs()
explainer = shap.TreeExplainer(clf)
shap_values = explainer.shap_values(scaled_data.iloc[0,:-1].values.reshape(1,-1))
shap.force_plot(explainer.expected_value[0], shap_values[0], link='logit')  # repeat changing 0 for i in range(0, 5)

enter image description here

The pictures above make sense as the class is '3'. But why this base_value, shouldn't it be 1/5? I asked myself a while ago a similar question but this time I set already link='logit'.

enter image description here

G. Macia
  • 1,204
  • 3
  • 23
  • 38

1 Answers1

4

link="logit" does not seem right for multiclass, as it's only suitable for binary output. This is why you do not see probabilities summing up to 1.

Let's streamline your code:

import numpy as np
from sklearn.datasets import make_blobs
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans 
import xgboost as xgb
import shap
from scipy.special import softmax, logit, expit
np.random.seed(42)

X, y_true = make_blobs(n_samples=500, centers=5, n_features=3, random_state=0)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
kmeans = KMeans(n_clusters=5)
y_predicted = kmeans.fit_predict(X_scaled, )

clf = xgb.XGBClassifier()
clf.fit(X_scaled, y_predicted)
shap.initjs()

Then, what you see as expected values in:

explainer = shap.TreeExplainer(clf)
explainer.expected_value
array([0.67111245, 0.60223354, 0.53357694, 0.50821152, 0.50145331])

are base scores in raw space.

The multi-class raw scores can be converted to probabilities with softmax:

softmax(explainer.expected_value)
array([0.22229282, 0.20749694, 0.19372895, 0.18887673, 0.18760457])

shap.force_plot(..., link="logit") doesn't make sense for multiclass, and it seems impossible to switch from raw to probability and still maintain additivity (because softmax(x+y) ≠ softmax(x) + softmax(y)).

Should you wish to analyze your data in probability space try KernelExplainer:

from shap import KernelExplainer
masker = shap.maskers.Independent(X_scaled, 100)
ke = KernelExplainer(clf.predict_proba, data=masker.data)
ke.expected_value
# array([0.18976762, 0.1900516 , 0.20042894, 0.19995041, 0.21980143])
shap_values=ke.shap_values(masker.data)
shap.force_plot(ke.expected_value[0], shap_values[0][0])

enter image description here

or summary plot:

from shap import Explanation
shap.waterfall_plot(Explanation(shap_values[0][0],ke.expected_value[0]))

enter image description here

which are now additive for shap values in probability space and align well with both base probabilities (see above) and predicted probabilities for 0th datapoint:

clf.predict_proba(masker.data[0].reshape(1,-1))
array([[2.2844513e-04, 8.1287889e-04, 6.5225776e-04, 9.9737883e-01,
        9.2762709e-04]], dtype=float32)
Sergey Bushmanov
  • 23,310
  • 7
  • 53
  • 72
  • To me the output of shap.force_plot(ke.expected_value[0], shap_values[0][0]) does not make sense yet. shap_values length is 5, like the num_clusters. y_predicted[0] is '0', So I would then expect the output of this plot f(x) to be 1 not 0. Same issue for the shap.waterfall_plot, I struggle to match the output of the plot with the predicted classes. – G. Macia Nov 27 '20 at 19:48
  • If it still does not make sense to you I can advise several things. First convert shap values to numpy array. The dimension of the array will be [n_classes, n_samples, n_features]. To better see it make num of classes and features different. Then `shap_values_arr[:,i,:].sum(1) +expected_values` will be an array of length n_classes with either raw or probabilities predictions for ith datapoint. Of course, the expected raw can be reconciled with probabilities via softmax and can be checked against `predict_proba` method, as I showed you in the example with waterfall plot – Sergey Bushmanov Nov 28 '20 at 05:56
  • Thanks for your help. What I mean is that if you look into y_predicted[:5] it will return you [0, 0, 2, 3, 3]. So the first row is a "0". This is not consistent with your plot above. – G. Macia Dec 08 '20 at 20:00
  • @G.Macia Thanks for coming back with your comments. The results made, and still make, perfect sense to me. All the printed out results well agree with the plots. As per your last comment, the probabilities for 0th datapoint tell the class is 4 (from `predict_proba`), and I seriously doubt `predict` method will give any different. – Sergey Bushmanov Dec 08 '20 at 20:17
  • As well, please also take note `y_predicted`, as signified in the code, is ground truth, and `predict_proba` is what model learns to predict. So under many circumstances they may differ. – Sergey Bushmanov Dec 08 '20 at 20:20
  • And perhaps last comment. Shap doesn't tell you if a prediction right or wrong. It tells you what forced this particular datapoint to deviate from base value so it was predicted as it is. – Sergey Bushmanov Dec 08 '20 at 20:36
  • I see - what is exactly masker = shap.maskers.Independent(X_scaled, 100)? if I look at masker.data[0] this is not equal to X_scaled[0]. I looked at the documentation for shap.KernelExplainer but still not clear to me. – G. Macia Dec 09 '20 at 08:45