5
  1. What are the differences between the sklearnAPI(LGBMModel, LGBMClassifier etc) and default API(lgb.Dataset, lgb.cv, lgb.train) of lightgbm? Which one should I prefer using?

  2. Is it better to use lgb.cv or gridsearchcv/randomisedsearchcv of sklearn when using lightgbm?

Sift
  • 633
  • 1
  • 10
  • 18

1 Answers1

6
  1. This answer has been well-covered here

  2. Based on this notebook by Will Koehrsen, the sklearn cross validation API does not include the option for early stopping. Therefore, if you wish to use early stopping rounds(which can be very useful if you want to stop training when the validation score has not improved for a given number of estimators), it is better to use LightGBM cross validation (lgb.cv) function.

    Furthermore, an excerpt from Mikhail Lisyovi's answer - "Technically, lightbgm.cv() allows you only to evaluate performance on a k-fold split with fixed model parameters. For hyper-parameter tuning you will need to run it in a loop providing different parameters and recoding averaged performance to choose the best parameter set. after the loop is complete. This interface is different from sklearn, which provides you with complete functionality to do hyperparameter optimisation in a CV loop. Personally, I would recommend to use the sklearn-API of lightgbm. It is just a wrapper around the native lightgbm.train() functionality, thus it is not slower. But it allows you to use the full stack of sklearn toolkit, thich makes your life MUCH easier."

Thus, which method you end up using depends on the context of the problem as well as what factors matter more to you - early_stopping_rounds or ease of hyperparameter optimisation over varying parameters.

Community
  • 1
  • 1
Sift
  • 633
  • 1
  • 10
  • 18
  • 1
    One add-on here on the usage of early stopping. In fact, sklearn v0.20.3 already has fit parameters as one of the arguments in `cross_validate`: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html . So if you have a hold-out set you can use it directly. if you want to use the hold-out fold in the CV split for early stopping, that you can write a loop yourself iterating over a CV iterator, e.g. `KFold`. The you can do fitting and evaluation in your favourite manner and – Mischa Lisovyi Mar 08 '19 at 07:17
  • Thanks for the response @MykhailoLisovyi – Sift Mar 15 '19 at 01:50