I'm attempting to do a grid search to optimize my model but it's taking far too long to execute. My total dataset is only about 15,000 observations with about 30-40 variables. I was successfully able to run a random forest through the gridsearch which took about an hour and a half but now that I've switched to SVC it's already ran for over 9 hours and it's still not complete. Below is a sample of my code for the cross validation:
from sklearn.model_selection import GridSearchCV
from sklearn import svm
from sklearn.svm import SVC
SVM_Classifier= SVC(random_state=7)
param_grid = {'C': [0.1, 1, 10, 100],
'gamma': [1,0.1,0.01,0.001],
'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
'degree' : [0, 1, 2, 3, 4, 5, 6]}
grid_obj = GridSearchCV(SVM_Classifier,
return_train_score=True,
param_grid=param_grid,
scoring='roc_auc',
cv=3,
n_jobs = -1)
grid_fit = grid_obj.fit(X_train, y_train)
SVMC_opt = grid_fit.best_estimator_
print('='*20)
print("best params: " + str(grid_obj.best_estimator_))
print("best params: " + str(grid_obj.best_params_))
print('best score:', grid_obj.best_score_)
print('='*20)
I have already reduced the cross validation from 10 to 3, and I'm using n_jobs=-1 so I'm engaging all of my cores. Is there anything else I'm missing that I can do here to speed up the process?