The answer to your question lies in the first 3 lines on the SHAP github project:
SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions
The story of SHAP
started with Scott Lundeberg observing available ML explanation methods and noting that all of them satisfied Additive property (Definition 1 in the aforementioned paper):
Additive feature attribution methods have an explanation model that is a linear function of binary variables
On top of that he added 3 other desirable properties:
Property 1: Local accuracy (local explanations should add up to model predictions, equivalent to original Shapley's efficiency)
Property 2: Missingness (missing feature contributes nothing, close to original dummy properties)
Property 3: Consistency (if model changes, the desired explanation values should change in the same direction, close to original additivity)
It turns out that:
- Shapley values satisfy all the Properties 1,2,3 ("satisfy" means here all the 4 original Shapley properties hold as soon as Property 1,2,3 hold)
- Provide a unique solution (unique set of marginal contributions), which was mathematically proved yet by Young, 1985
Then, as we fixed Shapley values as a solution to the problem of model explainability with desired Properties 1,2,3, the question arises:
How to calculate Shapley values with/without a feature?
Every ML practitioner would know that a model changes if we drop/add a feature. On top of that, for a non-linear model the order in which we add features matters. So exactly calculating Shapley values by searching through all possible "2^M" feature combinations, while retraining models, is inefficient (or computationally prohibitive).
Now, to the answer to your question "Difference between Shapley values and SHAP
" :
SHAP
provide computationally efficient, theoretically robust way to calculate Shapley values for ML by:
- Using model trained only once (doesn't apply to
Exact
and KernelExplainer
)
- Averaging over dropped out features by sampling background data.
As a sidenote, the SHAP
solution is unique unlike that of LIME
, but this is unrelated to your question.