I am using sklearn.feature_selection.RFECV
to reduce the number of features in my final model. With non-cross-validated RFE, you can choose exactly how many features to select. However, with RFECV, you can only specify min_number_features_to_select
, which acts more like a lower bound.
So how does RFECV drop features in each iteration? I understand normal RFE, but how does cross validation come into play?
Here are my instances:
clf = GradientBoostingClassifier(loss='deviance', learning_rate=0.03, n_estimators=500,
subsample=1.0, criterion='friedman_mse', min_samples_leaf=100,
max_depth=7, max_features='sqrt', random_state=123)
rfe = RFECV(estimator=clf, step=1, min_features_to_select=35, cv=5, scoring='roc_auc',
verbose=1, n_jobs=-1)
rfe.fit(X_train, y_train)
I could not find anything more specific in the documentation or user guide.