0

The recursive feature elimination with cross-validation is taking too long to run. How do I increase the speed?

X and y

X = df.iloc[:,7:-2]  
y = df["subtype"]

X.shape

(867, 142513)

Scale the dataset

# scale the dataset
sc = StandardScaler()
X = pd.DataFrame(sc.fit_transform(X))

RFE with cross-validation

# Recursive feature elimination with cross-validation
## Create the RFE object and compute a cross-validated score.
svc = SVC(kernel="linear")
## The "accuracy" scoring shows the proportion of correct classifications

min_features_to_select = 7000  # Minimum number of features to consider
rfecv = RFECV(
    estimator=svc,
    step=7,
    cv=StratifiedKFold(5),
    scoring="accuracy",
    min_features_to_select=min_features_to_select,
)
rfecv.fit_transform(X, y)
melolilili
  • 239
  • 1
  • 8
  • This is likely just because you're fitting many `SVC`s to fairly large datasets: https://stackoverflow.com/q/53940258/10495893 – Ben Reiniger Oct 13 '22 at 12:53
  • Should I perform dimensionality reduction prior to scaling? – melolilili Oct 13 '22 at 13:33
  • I mentally interchanged the number of rows and columns; I'm not sure how well SVC scales to a huge number of columns, and my linked Q discusses many rows. How long does a single SVC fit on your full dataset take? // RFE _is_ a dimensionality reduction, although maybe something "cheaper" would be a good start. Consider also `linearSVC` instead of `SVC(kernel='linear')`. – Ben Reiniger Oct 14 '22 at 04:56

0 Answers0