4

There's a couple of other questions similar to this, but I couldn't find a solution which seems to fit. I am using LightGBM with Scikit-Optimize BayesSearchCV.

full_pipeline = skl.Pipeline(steps=[('preprocessor', pre_processor), 
                                        ('estimator',    lgbm.sklearn.LGBMClassifier())])
scorer=make_scorer(fl.lgb_focal_f1_score)
lgb_tuner = sko.BayesSearchCV(full_pipeline, hyper_space, cv=5, refit=True, n_iter=num_calls,scoring=scorer)
lgb_tuner.fit(balanced_xtrain, balanced_ytrain)

Training runs for a while, before it errors with the following:

Traceback (most recent call last):
  File "/var/training.py", line 134, in <module>
    lgb_tuner.fit(balanced_xtrain, balanced_ytrain)
  File "/usr/local/lib/python3.6/site-packages/skopt/searchcv.py", line 694, in fit
    groups=groups, n_points=n_points_adjusted
  File "/usr/local/lib/python3.6/site-packages/skopt/searchcv.py", line 579, in _step
    self._fit(X, y, groups, params_dict)
  File "/usr/local/lib/python3.6/site-packages/skopt/searchcv.py", line 423, in _fit
    for parameters in parameter_iterable
  File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 1041, in __call__
    if self.dispatch_one_batch(iterator):
  File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 777, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/usr/local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
    result = ImmediateResult(func)
  File "/usr/local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 572, in __init__
    self.results = batch()
  File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 531, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib/python3.6/site-packages/sklearn/pipeline.py", line 335, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/usr/local/lib/python3.6/site-packages/lightgbm/sklearn.py", line 857, in fit
    callbacks=callbacks, init_model=init_model)
  File "/usr/local/lib/python3.6/site-packages/lightgbm/sklearn.py", line 617, in fit
    callbacks=callbacks, init_model=init_model)
  File "/usr/local/lib/python3.6/site-packages/lightgbm/engine.py", line 252, in train
    booster.update(fobj=fobj)
  File "/usr/local/lib/python3.6/site-packages/lightgbm/basic.py", line 2467, in update
    return self.__boost(grad, hess)
  File "/usr/local/lib/python3.6/site-packages/lightgbm/basic.py", line 2503, in __boost
    ctypes.byref(is_finished)))
  File "/usr/local/lib/python3.6/site-packages/lightgbm/basic.py", line 55, in _safe_call
    raise LightGBMError(decode_string(_LIB.LGBM_GetLastError()))
lightgbm.basic.LightGBMError: Check failed: (best_split_info.left_count) > (0) at /__w/1/s/python-package/compile/src/treelearner/serial_tree_learner.cpp, line 651 .

Some answers to similar questions have suggested it might be as a consequence of using the GPU, but I do not have a GPU available. I don't know what else is causing it or how to try and fix it. Can anyone suggest anything?

Lucy
  • 179
  • 1
  • 4
  • 14

2 Answers2

1

I think this was due to my hyperparameter limits being wrong, causing one hyperparameter to be set to zero which shouldn't have been, though I'm not sure which one.

Lucy
  • 179
  • 1
  • 4
  • 14
0

I also encountered a similar error message. Some people said it was about GPU or categorical features. Those aren't in my case. My all features are numerical and I don't use GPU. My lgb version is 3.1.1.

I guess the problem is caused by some features which have mostly missing values. In my case for two features, values are NaN at more than half of the rows. While this doesn't explain the exact reason, it does cause problems with sampling or binning in the background.

You can remove these features. Or if these features need to be used, set the zero_as_missing parameter to True. Or handle missing values by your own way. This will solve your problem.

Atacan
  • 55
  • 8
  • This does not really answer the question. If you have a different question, you can ask it by clicking [Ask Question](https://stackoverflow.com/questions/ask). To get notified when this question gets new answers, you can [follow this question](https://meta.stackexchange.com/q/345661). Once you have enough [reputation](https://stackoverflow.com/help/whats-reputation), you can also [add a bounty](https://stackoverflow.com/help/privileges/set-bounties) to draw more attention to this question. - [From Review](/review/low-quality-posts/34518072) – Thomas Schillaci Jun 12 '23 at 11:53
  • If `zero_as_missing` is one possible solution to this error, move that to the top of the answer and provide some more detail on what this config setting does. I do think the context provided in the rest of the answer can be helpful. – Banjer Jun 12 '23 at 13:03
  • @ThomasSchillaci it solved the problem and also an answer. And this is an open issue on lightbhm. My answer also may help it. And you try to explain how to use stackoverflow to me?? – Atacan Jun 12 '23 at 15:44
  • This does not really answer the question. If you have a different question, you can ask it by clicking [Ask Question](https://stackoverflow.com/questions/ask). To get notified when this question gets new answers, you can [follow this question](https://meta.stackexchange.com/q/345661). Once you have enough [reputation](https://stackoverflow.com/help/whats-reputation), you can also [add a bounty](https://stackoverflow.com/help/privileges/set-bounties) to draw more attention to this question. - [From Review](/review/late-answers/34515585) – Alez Jun 12 '23 at 16:50
  • @Atacan my comment was auto-generated from SO's moderation system as your answer was flagged for having length and content issues. Banjer explains well how you can improve it – Thomas Schillaci Jun 13 '23 at 08:56