I'm optimizing some paramters for an SVC in sklearn, and the biggest issue here is having to wait 30 minutes before I try out any other parameter ranges. Worse is the fact that I'd like to try more values for c and gamma within the same range (so I can create a smoother surface plot) but I know that it will just take longer and longer... When I ran it today I changed the cache_size from 200 to 600 (without really knowing what it does) to see if it made a difference. The time decreased by about a minute.
Is this something I can help? Or am I just gonna have to deal with a very long time?
clf = svm.SVC(kernel="rbf" , probability = True, cache_size = 600)
gamma_range = [1e-7,1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1e0,1e1]
c_range = [1e-3,1e-2,1e-1,1e0,1e1,1e2,1e3,1e4,1e5]
param_grid = dict(gamma = gamma_range, C = c_range)
grid = GridSearchCV(clf, param_grid, cv= 10, scoring="accuracy")
%time grid.fit(X_norm, y)
returns:
Wall time: 32min 59s
GridSearchCV(cv=10, error_score='raise',
estimator=SVC(C=1.0, cache_size=600, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=True, random_state=None,
shrinking=True, tol=0.001, verbose=False),
fit_params={}, iid=True, loss_func=None, n_jobs=1,
param_grid={'C': [0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0, 10000.0, 100000.0], 'gamma': [1e-07, 1e-06, 1e-05, 0.0001, 0.001, 0.01, 0.1, 1.0, 10.0]},
pre_dispatch='2*n_jobs', refit=True, score_func=None,
scoring='accuracy', verbose=0)