So when feature_pertrubation='tree_path_dependent'
the data
argument is optional. If we give a background dataset, do we have the same behaviour as if feature_pertrubation='interventional
?
From my minimal example that's what it seems like, at least for expected_value:
import shap
import numpy as np
from sklearn.tree import DecisionTreeRegressor
num_points = 500
num_samples = 100
num_features = 5
rng = np.random.default_rng(seed=1)
X = rng.normal(size=(num_points, num_features))
y = rng.integers(2, size=(num_points,))
X_sample = X[np.random.randint(X.shape[0], size=num_samples), :]
dt_model = DecisionTreeRegressor(max_depth=2).fit(X, y)
explainer1 = shap.TreeExplainer(dt_model, feature_perturbation='tree_path_dependent', model_output='raw')
explainer2 = shap.TreeExplainer(dt_model, feature_perturbation='tree_path_dependent', data=X_sample, model_output='raw')
explainer3 = shap.TreeExplainer(dt_model, feature_perturbation='interventional', data=X_sample, model_output='raw')
print(f'explainer1.expected_value = {explainer1.expected_value}')
print(f'explainer2.expected_value = {explainer2.expected_value}')
print(f'explainer3.expected_value = {explainer3.expected_value}')
explainer1.expected_value = [0.514]
explainer2.expected_value = 0.5139024767801856
explainer3.expected_value = 0.5139024767801856