Using scikit to determine contributions of each feature to a specific class prediction

Question

I am using a scikit extra trees classifier:

model = ExtraTreesClassifier(n_estimators=10000, n_jobs=-1, random_state=0)

Once the model is fitted and used to predict classes, I would like to find out the contributions of each feature to a specific class prediction. How do I do that in scikit learn? Is it possible with extra trees classifier or do I need to use some other model?

Out of curiosity, why do you include the pandas library? – tmthydvnprt Mar 22 '16 at 11:49 — tmthydvnprt, Mar 22 '16 at 11:49

Ulf Aslak · Accepted Answer · 2018-10-12T06:01:55.203

Update

Being more knowledgable about ML today than I was 2.5 years ago, I will now say this approach only works for highly linear decision problems. If you carelessly apply it to a non-linear problem you will have trouble.

Example: Imagine a feature for which neither very large nor very small values predict a class, but values in some intermediate interval do. That could be water intake to predict dehydration. But water intake probably interacts with salt intake, as eating more salt allows for a greater water intake. Now you have an interaction between two non-linear features. The decision boundary meanders around your feature-space to model this non-linearity and to ask only how much one of the features influences the risk of dehydration is simply ignorant. It is not the right question.

Alternative: Another, more meaningful, question you could ask is: If I didn't have this information (if I left out this feature) how much would my prediction of a given label suffer? To do this you simply leave out a feature, train a model and look at how much precision and recall drops for each of your classes. It still informs about feature importance, but it makes no assumptions about linearity.

Below is the old answer.

I worked through a similar problem a while back and posted the same question on Cross Validated. The short answer is that there is no implementation in sklearn that does all of what you want.

However, what you are trying to achieve is really quite simple, and can be done by multiplying the average standardised mean value of each feature split on each class, with the corresponding model._feature_importances array element. You can write a simple function that standardises your dataset, computes the mean of each feature split across class predictions, and does element-wise multiplication with the model._feature_importances array. The greater the absolute resulting values are, the more important the features will be to their predicted class, and better yet, the sign will tell you if it is small or large values that are important.

Here's a super simple implementation that takes a datamatrix X, a list of predictions Y and an array of feature importances, and outputs a JSON describing importance of each feature to each class.

def class_feature_importance(X, Y, feature_importances):
    N, M = X.shape
    X = scale(X)

    out = {}
    for c in set(Y):
        out[c] = dict(
            zip(range(N), np.mean(X[Y==c, :], axis=0)*feature_importances)
        )

    return out

Example:

import numpy as np
import json
from sklearn.preprocessing import scale

X = np.array([[ 2,  2,  2,  0,  3, -1],
              [ 2,  1,  2, -1,  2,  1],
              [ 0, -3,  0,  1, -2,  0],
              [-1, -1,  1,  1, -1, -1],
              [-1,  0,  0,  2, -3,  1],
              [ 2,  2,  2,  0,  3,  0]], dtype=float)

Y = np.array([0, 0, 1, 1, 1, 0])
feature_importances = np.array([0.1, 0.2, 0.3, 0.2, 0.1, 0.1])
#feature_importances = model._feature_importances

result = class_feature_importance(X, Y, feature_importances)

print json.dumps(result,indent=4)

{
    "0": {
        "0": 0.097014250014533204, 
        "1": 0.16932975630904751, 
        "2": 0.27854300726557774, 
        "3": -0.17407765595569782, 
        "4": 0.0961523947640823, 
        "5": 0.0
    }, 
    "1": {
        "0": -0.097014250014533177, 
        "1": -0.16932975630904754, 
        "2": -0.27854300726557779, 
        "3": 0.17407765595569782, 
        "4": -0.0961523947640823, 
        "5": 0.0
    }
}

The first level of keys in result are class labels, and the second level of keys are column-indices, i.e. feature-indices. Recall that large absolute values corresponds to importance, and the sign tells you whether it's small (possibly negative) or large values that matter.

thanks @Ulf Aslak, can you add a citable reference? i.e. some peer reviewed paper or something similar? — user308827, Mar 29 '16 at 02:39
@user308827 to my knowledge there's no references to cite for this small implementation. The code is not doing anything fancy though, it just uses the feature importances given by the model and multiplies that with the mean of each feature split on class, because we can assume that for normalized data, well seperated features will have means for each class that are far away from 0. But there is plenty of work presenting methods for class-specific feature selection of various complexity, because after all it's not always this straight forward. — Ulf Aslak, Mar 29 '16 at 07:29
@UlfAslak When I try to run the exact above code, I am running into an error as follows: TypeError: keys must be str, int, float, bool or None, not int64. Can you please help me out with this? — Vivek, Mar 15 '21 at 02:32
@Vivek At which line do you get the error? Actually I recommend you do not use this code, as I write in the update it does not take into account non-linearities (and you can betcha they probably exist in your data). It is almost guaranteed you will end up with wrong interpretations if you do not take these into account properly. Check thorbjornwolf's answer below. — Ulf Aslak, Mar 17 '21 at 07:43
@UlfAslak - I have a similar problem but I tried using LIME and found some interesting insights.However, the lime score for explanation i only between 20-40 for 80pc of my records. However, the local feature that they identify for each of the class makes sense. So, am planning to use that features and compute the 'pos/neg' ratio for each of the featire. Do you think that would make sense? — The Great, Mar 29 '22 at 15:07
Since, Lime allows us to discretize continuous variable into dofferent groups, this helps us know which group (of continuous variable) influences which class. So, do you think for this sort of insight, we need to still consider lime score? There is no prediction here. It just discretizes continuous value in buckets and I use that bickets to see how many times they appear in positive and negative classes — The Great, Mar 29 '22 at 15:14

Kevin · Answer 2 · 2016-03-22T14:28:05.007

This is modified from the docs

from sklearn import datasets
from sklearn.ensemble import ExtraTreesClassifier

iris = datasets.load_iris()  #sample data
X, y = iris.data, iris.target

model = ExtraTreesClassifier(n_estimators=10000, n_jobs=-1, random_state=0)
model.fit_transform(X,y) # fit the dataset to your model

I think feature_importances_ is what you're looking for:

In [13]: model.feature_importances_
Out[13]: array([ 0.09523045,  0.05767901,  0.40150422,  0.44558631])

EDIT

Maybe I misunderstood the first time (pre-bounty), sorry, this may be more along the lines of what you are looking for. There is a python library called treeinterpreter that produces the information I think you are looking for. You'll have to use the basic DecisionTreeClassifer (or Regressor). Following along from this blog post, you can discretely access the feature contributions in the prediction of each instance:

from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier

from treeinterpreter import treeinterpreter as ti

iris = datasets.load_iris()  #sample data
X, y = iris.data, iris.target
#split into training and test 
X_train, X_test, y_train, y_test = train_test_split( 
    X, y, test_size=0.33, random_state=0)

# fit the model on the training set
model = DecisionTreeClassifier(random_state=0)
model.fit(X_train,y_train)

I'll just iterate through each sample in X_test for illustrative purposes, this almost exactly mimics the blog post above:

for test_sample in range(len(X_test)):
    prediction, bias, contributions = ti.predict(model, X_test[test_sample].reshape(1,4))
    print "Class Prediction", prediction
    print "Bias (trainset prior)", bias

    # now extract contributions for each instance
    for c, feature in zip(contributions[0], iris.feature_names):
        print feature, c

    print '\n'

The first iteration of the loop yields:

Class Prediction [[ 0.  0.  1.]]
Bias (trainset prior) [[ 0.34  0.31  0.35]]
sepal length (cm) [ 0.  0.  0.]
sepal width (cm) [ 0.  0.  0.]
petal length (cm) [ 0.         -0.43939394  0.43939394]
petal width (cm) [-0.34        0.12939394  0.21060606]

Interpreting this output, it seems as though petal length and petal width were the most important contributors to the prediction of third class (for the first sample). Hope this helps.

thanks @Kevin, I do not think feature_importances_ gives me what I want. Just found something which might: http://blog.datadive.net/interpreting-random-forests/ — user308827, Feb 07 '16 at 16:00
thanks @Kevin, treeinterpretor does not work with the extra tree classifier than I am using — user308827, Mar 22 '16 at 21:33
I offered it as a suggestion because of the last statement, "do I need to use some other model?". Maybe someone more knowledgeable about scikit will be able to provide you a more detailed answer. — Kevin, Mar 22 '16 at 22:09
As I've outlined in my answer, this was is not implemented in sklearn. But its really simple once you have the feature importances, you just find the mean value of each feature for each predicted class and multiply by the corresponding feature importance. — Ulf Aslak, Mar 27 '16 at 11:08
@user308827 there is an implementation of treeinterpreter-like algorithm in https://github.com/TeamHG-Memex/eli5; it works with ExtraTreesClassifier. — Mikhail Korobov, Jan 12 '17 at 22:49

score 6 · Answer 3 · answered Apr 20 '18 at 08:40

The paper "Why Should I Trust You?": Explaining the Predictions of Any Classifier was submitted 9 days after this question, providing an algorithm for a general solution to this problem! :-)

In short, it is called LIME for "local interpretable model-agnostic explanations", and works by fitting a simpler, local model around the prediction(s) you want to understand.

What's more, they have made a python implementation (https://github.com/marcotcr/lime) with pretty detailed examples on how to use it with sklearn. For instance this one is on two-class random forest problem on text data, and this one is on continuous and categorical features. They are all to be found via the README on github.

The authors had a very productive year in 2016 concerning this field, so if you like reading papers, here's a starter:

I have a similar problem but I tried using LIME and found some interesting insights.However, the lime score for explanation i only between 20-40 for 80pc of my records. However, the local feature that they identify for each of the class makes sense. So, am planning to use that features and compute the 'pos/neg' ratio for each of the featire. Do you think that would make sense? — The Great, Mar 29 '22 at 15:07
@TheGreat No clue ^_^ I haven't touched this since making this answer nearly 4 years ago. For what it is worth, I've heard about [shap](https://shap.readthedocs.io/en/latest/index.html) as a viable alternative, so if you have time you could also consider that avenue. Good luck! — thorbjornwolf, Mar 30 '22 at 11:15

Rafael Valero · Answer 4 · 2018-04-20T08:32:27.863

So far I have been checking eli5 and treeinterpreter (both have been mentioned before) and I think eli5 will be the most helpfull, because I think have more options and is more generic and updated.

Nevertheless after some time I apply eli5 for a particular case and I could not obtained negative contributions for ExtraTreesClassifier researching a little bit more I realised I was obtaining the importance or weight as seen here. Because I was more interested in something like contribution, as mentioned of the title of this questions, I understand some feature could have a negative effect but when measuring the importance the sign is not important, so feature with positive effects and negatives are put together.

Because I was very interested in the sign I did as follows: 1) obtain the contributions for all cases 2) agreage all the results to be able to distinguish the same. No very elegant solution, probably there is something better out there, I post it here in case it helps.

I reproduce the same that previous post.

from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import  (ExtraTreesClassifier, RandomForestClassifier, 
                              AdaBoostClassifier, GradientBoostingClassifier)
import eli5


iris = datasets.load_iris()  #sample data
X, y = iris.data, iris.target
#split into training and test 
X_train, X_test, y_train, y_test = train_test_split( 
    X, y, test_size=0.33, random_state=0)

# fit the model on the training set
#model = DecisionTreeClassifier(random_state=0)
model = ExtraTreesClassifier(n_estimators= 100)

model.fit(X_train,y_train)


aux1 = eli5.sklearn.explain_prediction.explain_prediction_tree_classifier(model,X[0], top=X.shape[1])

aux1

Whith output

The previous results work with one case I want to run all and create an average:

This is how a datrame with the results looks like:

aux1 = eli5.sklearn.explain_prediction.explain_prediction_tree_classifier(model,X[0], top=X.shape[0])
aux1 = eli5.format_as_dataframe(aux1)
# aux1.index = aux1['feature']
# del aux1['target']
aux


target  feature weight  value
0   0   <BIAS>  0.340000    1.0
1   0   x3  0.285764    0.2
2   0   x2  0.267080    1.4
3   0   x1  0.058208    3.5
4   0   x0  0.048949    5.1
5   1   <BIAS>  0.310000    1.0
6   1   x0  -0.004606   5.1
7   1   x1  -0.048211   3.5
8   1   x2  -0.111974   1.4
9   1   x3  -0.145209   0.2
10  2   <BIAS>  0.350000    1.0
11  2   x1  -0.009997   3.5
12  2   x0  -0.044343   5.1
13  2   x3  -0.140554   0.2
14  2   x2  -0.155106   1.4

So I create a function to combine previous kind of tables:

def concat_average_dfs(aux2,aux3):
    # Putting the same index together
#     I use the try because I want to use this function recursive and 
#     I could potentially introduce dataframe with those indexes. This
#     is not the best way.
    try:
        aux2.set_index(['feature', 'target'],inplace = True)
    except:
        pass
    try:
        aux3.set_index(['feature', 'target'],inplace = True)
    except:
        pass
    # Concatenating and creating the meand
    aux = pd.DataFrame(pd.concat([aux2['weight'],aux3['weight']]).groupby(level = [0,1]).mean())
    # Return in order
    #return aux.sort_values(['weight'],ascending = [False],inplace = True)
    return aux
aux2 = aux1.copy(deep=True)
aux3 = aux1.copy(deep=True)

concat_average_dfs(aux3,aux2)

So now I only have to use previous function with all the examples I wish. I will take the whole population not only the training set. Check the average effect in all real cases

for i in range(X.shape[0]):


    aux1 = eli5.sklearn.explain_prediction.explain_prediction_tree_classifier(model,X\[i\], top=X.shape\[0\])
    aux1 = eli5.format_as_dataframe(aux1)

    if 'aux_total'  in locals() and 'aux_total' in  globals():
        aux_total = concat_average_dfs(aux1,aux_total)
    else:
        aux_total = aux1

With result:

Las table show the average effects of each feature for all my real population.

Companion notebook in my github.

score 0 · Answer 5 · answered Jul 28 '20 at 13:39

0

As @thorbjornwolf showed, a method called LIME, including a Python Library, exist for such a problem. Another Library for this problem is SHAP, for Shapley Values. Both library look viable, and offer a complete solution to solve this problem.

answered Jul 28 '20 at 13:39

Adept

522
3
16

You should give an example or a more comprehensive explanation of why and how to use SHAP, otherwise your's if not really an answer, rather a comment. – Ulf Aslak Mar 17 '21 at 07:48

Using scikit to determine contributions of each feature to a specific class prediction

5 Answers5

Update

Linked