Based on Recursive feature elimination and grid search using scikit-learn, I know that RFECV
can be combined with GridSearchCV
to obtain better parameter setting for the model like linear SVM.
As said in the answer, there are two ways:
"Run GridSearchCV on RFECV, which will result in splitting the data into folds two times (ones inside GridSearchCV and once inside RFECV), but the search over the number of components will be efficient."
"Do GridSearchCV just on RFE, which would result in a single splitting of the data, but in very inefficient scanning of the parameters of the RFE estimator."
To make my question clear, I have to firstly clarify RFECV:
Split the whole data into n folds.
In every fold, obtain the feature rank by fitting only the training data to
rfe
.Sort the ranking and fit the training data to
SVM
and test it on testing data for scoring. This should be done m times, each with decreasing number of features, where m is the number of features assumingstep=1
.A sequence of scores is obtained in the previous step and such sequence would be lastly averaged across n folds after step 1~3 have been done in n times, resulting in an averaged scoring sequence suggesting the best number of features to do in
rfe
.Take that best number of features as the argument of
n_features_to_select
inrfe
fitted with the original whole data..support_
to get the "winners" among features;.grid_scores_
to get the averaged scoring sequence.- Please correct me if I am wrong, thank you.
So my question is where to put GridSearchCV? I guess the second way "do GridSearchCV just on RFE" is do GridSearchCV on step 5 which sets the parameter of SVM to one of the value in the grid, fit it on training data split by GridSearchCV to obtain the number of features suggested in step 4, and test it with the rest of the data for the score. Such process is done in k times and an averaged score indicates the goodness of that value in the grid, where k is the argument cv
in GridSearchCV. However, selected features might be different due to alternative training data and grid value, which makes this second way not reasonable if it is done as my guess.
How actually does GridSearchCV be combined with RFECV?