3

There is a proposal to implement this in Sklearn #15075, but in the meantime, eli5 is suggested as a solution. However, I'm not sure if I'm using it the right way. This is my code:

from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.svm import SVR
import eli5
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
estimator = SVR(kernel="linear")
perm = eli5.sklearn.PermutationImportance(estimator,  scoring='r2', n_iter=10, random_state=42, cv=3)
selector = RFECV(perm, step=1, min_features_to_select=1, scoring='r2', cv=3)
selector = selector.fit(X, y)
selector.ranking_
#eli5.show_weights(perm) # fails: AttributeError: 'PermutationImportance' object has no attribute 'feature_importances_'

There are a few issues:

  1. I am not sure if I am using cross-validation the right way. PermutationImportance is using cv to validate importance on the validation set, or cross-validation should be only with RFECV? (in the example, I used cv=3 in both cases, but not sure if that's the right thing to do)

  2. If I uncomment the last line, I'll get a AttributeError: 'PermutationImportance' ... is this because I fit using RFECV? what I'm doing is similar to the last snippet here: https://eli5.readthedocs.io/en/latest/blackbox/permutation_importance.html

  3. as a less important issue, this gives me a warning when I set cv in eli5.sklearn.PermutationImportance :

.../lib/python3.8/site-packages/sklearn/utils/validation.py:68: FutureWarning: Pass classifier=False as keyword args. From version 0.25 passing these as positional arguments will result in an error warnings.warn("Pass {} as keyword args. From version 0.25 "

The whole process is a bit vague. Is there a way to do it directly in Sklearn? e.g. by adding a feature_importances attribute?

towi_parallelism
  • 1,421
  • 1
  • 16
  • 38

1 Answers1

3

Since the objective is to select the optimal number of features with permutation importance and recursive feature elimination, I suggest using RFECV and PermutationImportance in conjunction with a CV splitter like KFold. The code could then look like this:

import warnings
from eli5 import show_weights
from eli5.sklearn import PermutationImportance
from sklearn.datasets import make_friedman1
from sklearn.feature_selection import RFECV
from sklearn.model_selection import KFold
from sklearn.svm import SVR


warnings.filterwarnings("ignore", category=FutureWarning)

X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)

splitter = KFold(n_splits=3) # 3 folds as in the example

estimator = SVR(kernel="linear")
selector = RFECV(
    PermutationImportance(estimator,  scoring='r2', n_iter=10, random_state=42, cv=splitter),
    cv=splitter,
    scoring='r2',
    step=1
)
selector = selector.fit(X, y)
selector.ranking_

show_weights(selector.estimator_)

Regarding your issues:

  1. PermutationImportance will calculate the feature importance and RFECV the r2 scoring with the same strategy according to the splits provided by KFold.

  2. You called show_weights on the unfitted PermutationImportance object. That is why you got an error. You should access the fitted object with the estimator_ attribute instead.

  3. Can be ignored.

afsharov
  • 4,774
  • 2
  • 10
  • 27
  • thanks for your complete answer! The reasons why I used `RFECV` is because I need `grid_scores` at the end to find the optimal number of features, i.e. the score using the most important features at each step. Now I have 2 ways: I) use your approach and then: `cross_val_score(selector.estimator_, X,y, scoring='r2', cv=3)` at each step or II) use my initial attempt: `selector = RFECV(PermutationImportance(estimator, scoring='r2', n_iter=2, random_state=42, cv=4),step=1, scoring='r2', cv=3)` then `selector.grid_scores_` Does that make sense? or can you think of a better way to do it? – towi_parallelism Jun 24 '20 at 11:08
  • Okay, I see. In this case, sticking with `RFECV` is the cleaner approach in my opinion. I would then suggest providing a CV splitter object to both `RFECV` and `PermutationImportance` to make sure all metrics are calculated on the same splits. I will update the answer accordingly. – afsharov Jun 24 '20 at 13:45
  • thanks! that's exactly the part I'm not sure about. I thought each fold of the `RFECV` is split into smaller k folds when using `cv=k` for `PermutationImportance` in and then feature importance is calculated on the fold train and test data. Is it not the case? – towi_parallelism Jun 24 '20 at 15:39
  • From my understanding, the answer is yes and no. `RFECV` works essentially in two steps: first it determines the optimal number of features by fitting across the train folds and choosing the number of features that give the least averaged error across all folds. In this step, you are right: `PermutationImportance` will use smaller folds to compute its values. But after that, a conventional `RFE` model with the previously found optimal number of features is fit on the whole dataset to find the actual features. Now, `PermutationImportance` will use the same splits as in the step before. – afsharov Jun 24 '20 at 18:57
  • This is what I understood from looking into the source code. I do not see a way to ensure the splits are also the same in the first step. This is probably the best compromise you can get unless you define your own method. But you definitely want cross-validation also for `PermutationImportance`. Because otherwise, the feature importance is calculated on the data the estimator was trained on and thus, does not reflect importance of features for generalization. – afsharov Jun 24 '20 at 19:04
  • I actually debugged it and checked the `train` and `test` size at this line of the code https://github.com/TeamHG-Memex/eli5/blob/master/eli5/sklearn/permutation_importance.py#L219 so, I can confirm that they do split within each split of the `RFECV` which is kinda ok. I expected that to be the case. But I'm gonna accept your answer. thanks – towi_parallelism Jun 29 '20 at 21:58
  • There's any alternative to avoid this split-in-split situation ? Because now it takes prohibitive amount of time – Daniel Wiczew Aug 03 '21 at 05:42