1

Background of the Problem

I want to explain the outcome of machine learning (ML) models using SHapley Additive exPlanations (SHAP) which is implemented in the shap library of Python. As a parameter of the function shap.Explainer(), I need to pass an ML model (e.g. XGBRegressor()). However, in each iteration of the Leave One Out Cross Validation (LOOCV), the ML model will be different as in each iteration, I am training on a different dataset (1 participant’s data will be different). Also, the model will be different as I am doing feature selection in each iteration.

Then, My Question

In LOOCV, How can I use shap.Explainer() function of shap library to present the performance of a machine learning model? It can be noted that I have checked several tutorials (e.g. this one, this one) and also several questions (e.g. this one) of SO. But I failed to find the answer of the problem.

Thanks for reading!


Update

I know that in LOOCV, the model found in each iteration can be explained by shap.Explainer(). However, as there is 250 participants' data, if I apply shap here for each model, there will be 250 output! Thus, I want to get a single output which will present the performance of the 250 models.

Md. Sabbir Ahmed
  • 850
  • 8
  • 22
  • LOOCV is still a CV, where you leave out a data point, train a model and check model's generalization ability on the left out data. There is no problem in applying SHAP in LOOCV. I suggest you either check your definitions or describe the exact problem you have. – Sergey Bushmanov Jun 23 '21 at 07:15

1 Answers1

1

You seem to train model on a 250 datapoints while doing LOOCV. This is about choosing a model with hyperparams that will ensure best generalization ability.

Model explanation is different from training in that you don't sift through different sets of hyperparams -- note, 250 LOOCV is already overkill. Will you do that with 250'000 rows? -- you are rather trying to understand which features influence output in what direction and by how much.

Training has it's own limitations (availability of data, if new data resembles the data the model was trained on, if the model good enough to pick up peculiarities of data and generalize well etc), but don't overestimate explanation exercise either. It's still an attempt to understand how inputs influence outputs. You may be willing to average 250 different matrices of SHAP values. But do you expect the result to be much more different from a single random train/test split?

Note as well:

However, in each iteration of the Leave One Out Cross Validation (LOOCV), the ML model will be different as in each iteration, I am training on a different dataset (1 participant’s data will be different).

In each iteration of LOOCV the model is still the same (same features, hyperparams may be different, depending on your definition of iteration). It's still the same dataset (same features)

Also, the model will be different as I am doing feature selection in each iteration.

Doesn't matter. Feed resulting model to SHAP explainer and you'll get what you want.

Sergey Bushmanov
  • 23,310
  • 7
  • 53
  • 72
  • Hi Sergey Bushmanov, +1 for your nice answer. And also thanks for your suggestion of averaging the SHAP values. But I have two concerns. 1) When in each iteration, the model will get a different participant's data, how do you say that _the model is still the same_? 2) As the feature selection can return different set of features in some iteration, how do you say that _it doesn't matter_? e.g. in 1st iteration, it selects f1, and f2. In the 2nd iteration, it selects f2 and f3 as important features. – Md. Sabbir Ahmed Jun 23 '21 at 08:30
  • SHAP is not for explaining CV, and I tried to make it clear in the beginning. Averaging CV results over different sets of hyperparams explains which set should be chosen, and SHAP is the wrong tool here. When you're done with CV you're welcomed to feed your model to SHAP explainer. – Sergey Bushmanov Jun 23 '21 at 08:36
  • Hi Sergey Bushmanov, I am sorry asking you another question. Do find any problem if average of SHAP values is calculated? Then, it will produce one output presenting the average of SHAP values (for 250 models) for each of the features. – Md. Sabbir Ahmed Jun 23 '21 at 08:48
  • You're free to average SHAP values any way you wish: mean of raw data (expect the resulting SHAP values to be close to 0), mean/median of abs, many many others, or mean of 250 matrices as you insist from the very beginning. My question what is the point of this fancy exercise if you most probably would achieve the same with a single *random* train|test split? Practically speaking, when you're doing Big Data exercise on a 12PB data, what will be your course of action? – Sergey Bushmanov Jun 23 '21 at 08:56
  • Thank you so much for your response and also thanks for asking the question. 1) If I get the same, then, I think that there is no need to calculate average. However, I found that sometimes set of important features differs even setting the same `random_state` for the feature selection algorithm. Thus, I think that SHAP values can also be different. 2) Surely for 12PB data, I will not use LOOCV :). – Md. Sabbir Ahmed Jun 23 '21 at 09:08
  • If data and model same -- SHAP values is the same. Sometimes setting `random_seed` is not enough. – Sergey Bushmanov Jun 23 '21 at 09:10