I am trying to evaluate svm on a huge dataset of size ~.3 million records. This is a multiclass problem with 23 features. Currently gridsearchcv takes ages to iterate parameters. Is there any strategy to speed this up? I guess .3 million records is a reasonable number and I am perplexed that the CPU usage doesn't go beyond 30% and RAM usage is limited to 50 %. I had the n_jobs set to -1 and pre_dispatch=1 as suggested in documentation. Nothing changes. With my inputs I am expecting a total of 24 iterations. Here is my sample code
from sklearn.multiclass import OneVsRestClassifier
from sklearn.grid_search import GridSearchCV
from sklearn import svm
model_to_set = OneVsRestClassifier(svm.SVC())
parameters = {
"estimator__C": [1,2,4,8],
"estimator__kernel": ["poly","rbf","linear"],
"estimator__degree":[1, 2, 3, 4],
}
model_tunning = GridSearchCV(model_to_set, param_grid=parameters,n_jobs=-1,pre_dispatch=1,
scoring='f1')
model_tunning.fit(mat[1:23], mat[0])
Appreciate any help.