3

Update: I found the color_bar and color_bar_label arguments and they don't affect it. I also discovered that if I display 26 or more features the bar will appear, but will appear small and thin like it does in the LoL example below. I've also tried changing the size of the plot and the space between feature names with no luck.

I am working to create a SHAP summary plot and while the plot appears, the vertical "feature value" color bar on the y-axis won't appear at all. SHAP real data plot

The force plots and decision plots all work fine. I've tried changing the max number of features to see if the axis just needed to be extended but it didn't fix anything. I am using python 3.9.7 (because of issues with 3.10 and some of the arches packages I think) and SHAP 0.39.0 in jupyter notebook. I have tried updating/uninstalling/reinstalling SHAP via conda (4.10.3). I even went to the SHAP walkthrough here and, following this exactly, a vertical feature value bar does appear but it appears very small. SHAP test plot For reference, this is what the walkthrough says it should look like.

enter image description here

I can't figure out the name of the bar itself or what to change to try and get it to appear. There are no error messages or warnings, it just doesn't show up at all in my real-use case or shows up very small with the example code and I'm not sure what settings to manipulate to change it.

The dataset for the walkthrough is from kaggle, here, and the walkthrough code to generate the example plot is here:

import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.model_selection import train_test_split
import shap
import matplotlib.pyplot as pl

shap.initjs()

# read in the data
prefix = "local_scratch/data/league-of-legends-ranked-matches/"
matches = pd.read_csv(prefix+"matches.csv")
participants = pd.read_csv(prefix+"participants.csv")
stats1 = pd.read_csv(prefix+"stats1.csv", low_memory=False)
stats2 = pd.read_csv(prefix+"stats2.csv", low_memory=False)
stats = pd.concat([stats1,stats2])

# merge into a single DataFrame
a = pd.merge(participants, matches, left_on="matchid", right_on="id")
allstats_orig = pd.merge(a, stats, left_on="matchid", right_on="id")
allstats = allstats_orig.copy()

# drop games that lasted less than 10 minutes
allstats = allstats.loc[allstats["duration"] >= 10*60,:]

# Convert string-based categories to numeric values
cat_cols = ["role", "position", "version", "platformid"]
for c in cat_cols:
    allstats[c] = allstats[c].astype('category')
    allstats[c] = allstats[c].cat.codes
allstats["wardsbought"] = allstats["wardsbought"].astype(np.int32)

X = allstats.drop(["win"], axis=1)
y = allstats["win"]

# convert all features we want to consider as rates
rate_features = [
    "kills", "deaths", "assists", "killingsprees", "doublekills",
    "triplekills", "quadrakills", "pentakills", "legendarykills",
    "totdmgdealt", "magicdmgdealt", "physicaldmgdealt", "truedmgdealt",
    "totdmgtochamp", "magicdmgtochamp", "physdmgtochamp", "truedmgtochamp",
    "totheal", "totunitshealed", "dmgtoobj", "timecc", "totdmgtaken",
    "magicdmgtaken" , "physdmgtaken", "truedmgtaken", "goldearned", "goldspent",
    "totminionskilled", "neutralminionskilled", "ownjunglekills",
    "enemyjunglekills", "totcctimedealt", "pinksbought", "wardsbought",
    "wardsplaced", "wardskilled"
]
for feature_name in rate_features:
    X[feature_name] /= X["duration"] / 60 # per minute rate

# convert to fraction of game
X["longesttimespentliving"] /= X["duration"]

# define friendly names for the features
full_names = {
    "kills": "Kills per min.",
    "deaths": "Deaths per min.",
    "assists": "Assists per min.",
    "killingsprees": "Killing sprees per min.",
    "longesttimespentliving": "Longest time living as % of game",
    "doublekills": "Double kills per min.",
    "triplekills": "Triple kills per min.",
    "quadrakills": "Quadra kills per min.",
    "pentakills": "Penta kills per min.",
    "legendarykills": "Legendary kills per min.",
    "totdmgdealt": "Total damage dealt per min.",
    "magicdmgdealt": "Magic damage dealt per min.",
    "physicaldmgdealt": "Physical damage dealt per min.",
    "truedmgdealt": "True damage dealt per min.",
    "totdmgtochamp": "Total damage to champions per min.",
    "magicdmgtochamp": "Magic damage to champions per min.",
    "physdmgtochamp": "Physical damage to champions per min.",
    "truedmgtochamp": "True damage to champions per min.",
    "totheal": "Total healing per min.",
    "totunitshealed": "Total units healed per min.",
    "dmgtoobj": "Damage to objects per min.",
    "timecc": "Time spent with crown control per min.",
    "totdmgtaken": "Total damage taken per min.",
    "magicdmgtaken": "Magic damage taken per min.",
    "physdmgtaken": "Physical damage taken per min.",
    "truedmgtaken": "True damage taken per min.",
    "goldearned": "Gold earned per min.",
    "goldspent": "Gold spent per min.",
    "totminionskilled": "Total minions killed per min.",
    "neutralminionskilled": "Neutral minions killed per min.",
    "ownjunglekills": "Own jungle kills per min.",
    "enemyjunglekills": "Enemy jungle kills per min.",
    "totcctimedealt": "Total crown control time dealt per min.",
    "pinksbought": "Pink wards bought per min.",
    "wardsbought": "Wards bought per min.",
    "wardsplaced": "Wards placed per min.",
    "turretkills": "# of turret kills",
    "inhibkills": "# of inhibitor kills",
    "dmgtoturrets": "Damage to turrets"
}
feature_names = [full_names.get(n, n) for n in X.columns]
X.columns = feature_names

# create train/validation split
Xt, Xv, yt, yv = train_test_split(X,y, test_size=0.2, random_state=10)
dt = xgb.DMatrix(Xt, label=yt.values)
dv = xgb.DMatrix(Xv, label=yv.values)

params = {
    "eta": 0.5,
    "max_depth": 4,
    "objective": "binary:logistic",
    "silent": 1,
    "base_score": np.mean(yt),
    "eval_metric": "logloss"
}
model = xgb.train(params, dt, 300, [(dt, "train"),(dv, "valid")], early_stopping_rounds=5, verbose_eval=25)

# compute the SHAP values for every prediction in the validation dataset
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(Xv)

shap.summary_plot(shap_values, Xv)
semanning
  • 81
  • 1
  • 5
  • Does this answer your question? [Shap - The color bar is not displayed in the summary plot](https://stackoverflow.com/questions/70461753/shap-the-color-bar-is-not-displayed-in-the-summary-plot) – Jeremy Caney Jan 03 '22 at 17:51

3 Answers3

5

It seems someone else had the same question as me only a couple weeks after I asked here, and one solution was to downgrade matplotlib from 3.5. I downgraded to 3.4.3 and the issue is resolved.

semanning
  • 81
  • 1
  • 5
2

As mentioned above, it seems that the handling of colorbar or box aspect ratio has been changed in matplotlib.pyplot version 3.5. However, you can correct that.

  • Use shap.summary_plot(..., show=False) to allow altering the plot
  • Set the aspect of the colorbar with plt.gcf().axes[-1].set_aspect(1000)
  • Then set also the aspect of the color bar's box plt.gcf().axes[-1].set_box_aspect(1000)

This gives you the old result back. If you want to make the colorbar thicker, set the aspect to 100.

0

I also encounter the same problem. Oddly enough, I had this issue when I used Python 3.7.9, but when I switched to 3.6.8 it worked well. Not sure if some implementations of SHAP is sensitive to different Python versions.

  • Thanks for the suggestion, I was working in 3.9.7 and it happened. I also tried in 3.8.12 and 3.7. I tried going back to 3.6 like you suggested but I'm having some dependency issues with numpy in 3.6 I'll need to work through before I can really even try. – semanning Dec 17 '21 at 21:32