7

My goal is to do a grid search over various VW models in their parameter space (trying different loss functions and regularizations etc). Since the model could use multiple passes, I would like to use cross validation. I am wondering if I should implement my own cross validation code (perhaps as a bash script) or am I reinventing the wheel. Any pointers on whether this has been done before etc or best ways to proceed would be useful. I was looking at implementing cross validation in a bash script and using GNU parallel to parallelize the Grid Search

vkmv
  • 1,345
  • 1
  • 14
  • 24

2 Answers2

6

You should try the vw-hypersearch perl script ( https://github.com/JohnLangford/vowpal_wabbit/blob/HEAD/utl/vw-hypersearch ) which can also be found in the utl directory of VW. It can help you tune the VW parameters, but as for as cross-validation you have to implement your own code, feeding the algorithm with the data folds you intend to validate.

arielf
  • 5,802
  • 1
  • 36
  • 48
Luca Massaron
  • 1,734
  • 18
  • 25
  • 3
    I'm guessing this page didn't exist when LucaM gave his answer, but the following link has some good instructions: https://github.com/JohnLangford/vowpal_wabbit/wiki/Using-vw-hypersearch – jarfa Oct 31 '14 at 15:57
1

Allow me to answer this question in 2 folds,

  • Cross Validation: There is no flag for the same in vw. The reason being that even post cross validation, one would test on a future split and evaluate the learning of the model based on some metric derived from the Confusion Matrix.
  • Hyper-parameter Search: the vw-hypersearch uses golden ration search to search for an optimal value of a given parameter when the range is provided. Golden Ratio Search works for a function which is monotonically increasing or decreasing. When doing a search over a bunch of parameters the function is no longer a monotonically increasing or decreasing function. This can be handled using as you had pointed out

    -- Grid Search: very CPU intensive and time consuming.(we always fight with time)

    -- Random Search: Very efficient Reference: [http://dl.acm.org/citation.cfm?id=2188395][1]

Pramit
  • 1,373
  • 1
  • 18
  • 27