3

I have been deep diving on the github pages and reading the documentation, but I am not fully understanding whether HyperbandCV will be useful to speed up hyperparameter optimization in my case.

I am using SKLearn's pipeline functionality. And I am also testing models like LinearRegression() which doesn't support partial_fit; it has to use all the data to fit the parameters all at once. In this case, can HyperbandCV still be used? If it is used, what exactly is it optimizing if from my understanding neither Pipeline nor said models have partial fit implemented. In Hyperband's api, it reads that it needs to have partial_fit implemented in order to use it. However, in another documentation it reads it can be a drop-in replacement for RandomizedSearchCV since it just spends less time training low performing models.

If anyone can clarify this for me, this will be great.

Ife A
  • 43
  • 4

2 Answers2

1

Based on the recent https://blog.dask.org/2019/09/30/dask-hyperparam-opt, HyperbandSearchCV does require models implementing partial_fit because the point of using HyperbandSearchCV is to avoid training on the entire data in order to make a decision whether the model is good. This is where HyperbandSearchCV's speed advantage comes from. The way I interpret the blog post is that once a model is fully trained, HyperbandSearchCV cannot do anything more, there's no early-stopping left to do. However, this might be true for the Dask implementation and not necessarily for the Hyperband algorithm described in the original paper which I should re-read.

wishihadabettername
  • 14,231
  • 21
  • 68
  • 85
  • Does this mean that there will be no hyper parameter search performance improvements if there is no partial_fit implemented in the models fit. Perhaps there are other speed improvements happening within hyperband cv during the hyper parameter search but not within the model fitting, do you think? – Ife A Oct 23 '19 at 21:02
  • According to the docs, HyperbandSearchCV only works with models implementing `partial_fit()`. It's not a question being slower, but of not working. You can try yourself, in fact, since you have the pipeline. – wishihadabettername Oct 24 '19 at 01:24
  • You can also read the source at https://github.com/dask/dask-ml/blob/master/dask_ml/model_selection/_hyperband.py. – wishihadabettername Oct 24 '19 at 01:30
0

There is a function wrapper that can help implement any sk-learn model to use partial fit. Refer to this link: https://ml.dask.org/incremental.html

First, you have to use this wrapper so that your sklearn supports partialfit. Becasuse by default there are very few sklearn models that support partialfit. and after that you can try hyperbandcv or any hyperparameter technique that do incremental training.

Professor
  • 87
  • 6