1

I'm trying to speed up the process of GridSearchCV (for tuning the parameters of the RBF-function) in Python. This, however, takes forever. I have a moderately small dataset (dimensions 600 x 8), so I don't think dimensionality is a problem.

I've heard of BaggingRegressors in this post: Making SVM run faster in python, but I can't seem to make it work for regression with GridSearchCV.

The following piece of Code works, but takes a really long time to compute.

 parameters = {'epsilon': np.arange(0.1, 1.0, 0.01) ,'C': 2.0 ** 
                  np.arange(-2, 9), 'gamma': np.arange(0.1, 1.0, 0.01)}
 svc = SVR(kernel='rbf')
 clf = GridSearchCV(svc, parameters)
 clf.fit(X_train, y_train)

So, I tried to speed it up like this:

 parameters = {'epsilon': np.arange(0.1, 1.0, 0.01) ,'C': 2.0 ** 
                  np.arange(-2, 9), 'gamma': np.arange(0.1, 1.0, 0.01)}
 svc = SVR(kernel='rbf')
 clf = GridSearchCV(svc, parameters)
 clf = BaggingRegressor(clf)
 clf.fit(X_train, y_train)

But this doesn't speed up the proces at all.

I'm afraid I don't fully understand how BaggingRegressor works, so if anybody has some insights, please let me know!

GetHacked
  • 546
  • 4
  • 21
Riley
  • 933
  • 10
  • 26

2 Answers2

2

This has nothing really to do with SVR or BagginRegressor as an algorithm, but simply the parameter grid you use. There is no need for such a small step size for epsilon and gamma.

>>> len(np.arange(0.1, 1.0, 0.01))
90

So you span a grid of 90*90*11=89100. So even if you classifier/regressor takes only a second to train, you have to wait for over 24 hours! Just take smaller steps (e.g. 0.1) as results aren't that sensitive. If there is a region (e.g. smaller epsilon), you may consider some sort of geometric series instead of linear increasing values).

Marcus V.
  • 6,323
  • 1
  • 18
  • 33
0

I would suggest to use a RandomSearchCv. Besides that, the time consumption of SVMs is exponentially increasing with huge datasets

b4shyou
  • 187
  • 1
  • 1
  • 7