3

I am using sklearn.feature_selection.RFECV to reduce the number of features in my final model. With non-cross-validated RFE, you can choose exactly how many features to select. However, with RFECV, you can only specify min_number_features_to_select, which acts more like a lower bound.

So how does RFECV drop features in each iteration? I understand normal RFE, but how does cross validation come into play?

Here are my instances:

clf = GradientBoostingClassifier(loss='deviance', learning_rate=0.03, n_estimators=500,
                                 subsample=1.0, criterion='friedman_mse', min_samples_leaf=100,
                                 max_depth=7, max_features='sqrt', random_state=123)
rfe = RFECV(estimator=clf, step=1, min_features_to_select=35, cv=5, scoring='roc_auc',
            verbose=1, n_jobs=-1)
rfe.fit(X_train, y_train)

I could not find anything more specific in the documentation or user guide.

Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76

1 Answers1

3

Your guess (edited out now) thinks of an algorithm that cross-validates the elimination step itself, but that is not how RFECV works. (Indeed, such an algorithm might stabilize RFE itself, but it wouldn't inform about the optimal number of features, and that is the goal of RFECV.)

Instead, RFECV runs separate RFEs on each of the training folds, down to min_features_to_select. These are very likely to result in different orders of elimination and final features, but none of that is taken into consideration: only the scores of the resulting models, for each number of features, on the test fold is retained. (Note that RFECV has a scorer parameter that RFE lacks.) Those scores are then averaged, and the best score corresponds to the chosen n_features_. Finally, a last RFE is run on the entire dataset with that target number of features.

source code

Ben Reiniger
  • 10,517
  • 3
  • 16
  • 29
  • Within each separate RFE process, how do all features end up getting an importance score, if the scores come from the resulting model (with a subset of features) on the test fold? – oustella Aug 03 '22 at 15:19
  • @oustella They don't get a global importance score. They get a local score that's used in the next elimination step. The global score is just "the score for `k` features on this fold", and those are aggregated across folds for the single `k`, even though those `k` features are (probably) different across folds. – Ben Reiniger Aug 03 '22 at 15:24
  • Thank you for further explaining! I understand those k features may be different for each training fold. Just confused about how a single RFE was able to assign feature importance to *all* features that it recursively eliminates (this is seen in the RFECV.cv_results_ for each split). Does it average the local importance from each elimination step? Looking at the source code, I'm not convinced that's the case. – oustella Aug 03 '22 at 15:40
  • @oustella `cv_results_` doesn't contain any information about feature importances, just the model's scores. Entry `i` in each value of that dict corresponds to `min_features_to_select + i` features. (This could use better documentation!) – Ben Reiniger Aug 03 '22 at 15:57
  • Oh wow! That blew my mind. Thanks for pointing that out! To be sure.. the score is what's passed to the scoring parameter correct? When unspecified, it defaults to the score of the estimator? – oustella Aug 03 '22 at 16:29
  • @oustella yeah, that looks right wrt scoring – Ben Reiniger Aug 03 '22 at 16:35