Im trying to optimize the script that runs RFE with gridsearchCV, but it is taking too long (~12hrs). My input data - residuals is of shape (231, 4950), and y_irep_mvpa is of the shape (231,)
What can I do to decrease the time?
Code snippet:
clf = SVC(kernel='linear')pipe = make_pipeline(StandardScaler(),RFE(estimator=clf))
parameters = {'rfe__n_features_to_select': range(1, 4950)}
grid = GridSearchCV(pipe, param_grid=parameters, cv=5, n_jobs=-1)grid.fit(residuals, y_irep_mvpa)
print('Best parameters:', grid.best_params_)print('Best accuracy:', grid.best_score_)
I requested for greater n_jobs allocation from the HPC I am running this script from. It still failed due to time limit. I tried decreasing the cv from 10 to 5, as suggested here: Is there a quicker way of running GridsearchCV, but doesn't seem to have helped.