3

The PermutationImportance object has some nice attributes such as feature_importances_ and feature_importances_std_.

To visualize in an HTML style this attributes I used eli5.show_weights function. However, I noticed that the displayed standard deviation does not agree with the values in feature_importances_std_.

More specifically, I can see that the displayed HTML values are equal to feature_importances_std_ * 2. Why is that ?

Code:

from sklearn import datasets
import eli5
from eli5.sklearn import PermutationImportance
from sklearn.svm import SVC, SVR

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2]  # we only take the first two features.
y = iris.target

clf = SVC()
perms = PermutationImportance(clf, n_iter=1000, cv=10, random_state=0).fit(X, y)

print(perms.feature_importances_)
# this is the actual SD
print(perms.feature_importances_std_)
# These are the displayed values
print(perms.feature_importances_std_* 2)

[0.39527333 0.17178   ] # the actual mean
[0.13927548 0.11061278] # the actual SD
[0.27855095 0.22122556] # the displayed values by `show_weights()`

eli5.show_weights(perms)

We can see that the diplayed standard deviation is doupled i.e. 2 * perms.feature_importances_std_.

Is this a bug maybe?

enter image description here

seralouk
  • 30,938
  • 9
  • 118
  • 133
  • Have you tried with explain_weights in place of show_weights? Are the results same? – DrSpill Mar 05 '20 at 08:10
  • I have tried, yes. Nothing changes. See here: https://pasteboard.co/IXF26AD.png – seralouk Mar 05 '20 at 08:55
  • For feature importances, you can also use [rfpimp](https://pypi.org/project/rfpimp/). – Tiago Martins Peres Mar 05 '20 at 12:33
  • funny thing, they also use `eli5`. See last code cell here: https://github.com/parrt/random-forest-importances/blob/master/notebooks/pimp.ipynb – seralouk Mar 05 '20 at 16:11
  • Is that only for random forest models? **Funny thing**, they also use `eli5`. See last code cell [here](https://github.com/parrt/random-forest-importances/blob/master/notebooks/pimp.ipynb) – seralouk Mar 05 '20 at 16:20
  • I tried your code and replaced SVC() with RandomForestClassifier(), and the `x2` is still there. Therefore it's probably in the display and not in the computation. – bendaizer Mar 09 '20 at 14:58
  • Yes I have also opened a github issue request where I really explain line by line that it’s not a model related problem. – seralouk Mar 10 '20 at 08:03

1 Answers1

2

Found the *2 :
It's in the template generating the feature importances html table in the following page

https://github.com/TeamHG-Memex/eli5/blob/63e99182dc682bbf225355c80a24807396a747b6/eli5/templates/feature_importances.html

        {% if not fw.std is none %}
            ± {{ "%0.4f"|format(2 * fw.std) }}
        {% endif %}

It's clearly put by hand

bendaizer
  • 1,235
  • 9
  • 18
  • 1
    Great. That’s what I needed to know. Still unclear (probably it’s a bug) why they do that. My GitHub issue is still open. Your answer is on spot so I accept it – seralouk Mar 10 '20 at 08:04
  • Yeah that's odd. I actually don't see the logic, and there is nothing written anywhere about it. I'll have a look at your github issue. – bendaizer Mar 10 '20 at 15:01