6

I am using XGBoost with early stopping. After about 1000 epochs, the model is still improving, but the magnitude of improvement is very low. I.e.:

 clf = xgb.train(params, dtrain, num_boost_round=num_rounds, evals=watchlist, early_stopping_rounds=10)

Is it possible to set a "tol" for early stopping? I.e.: the minimum level of improvement that is required to not trigger early stopping.

Tol is a common parameter in SKLearn models, such as MLPClassifier and QuadraticDiscriminantAnalysis. Thank you.

Chris Parry
  • 2,937
  • 7
  • 30
  • 71

3 Answers3

1

I do not think that there is a parameter tol in xgboost but you can set the early_stopping_round higher. This parameters means that if the performance on the test set does not improve for early_stopping_round times, then it stops. If you know that after 1000 epochs your model is still improving but very slowly, set early_stopping_round at 50 for example so it will be more "tolerante" about small changes in performance.

BenDes
  • 917
  • 8
  • 14
  • 1
    Did you mean to say setting the `early_stopping_round` lower (instead of higher)? A low values would terminate earlier than a high value. – kangaroo_cliff May 19 '21 at 04:50
1

The issue is still open in XGBoost Github's repo, so even though the wrappers such as sklearn and h2o seem to already have this feature, xgboost itself is still lacking the stopping_tolerance hyperparameter...

Let's upvote it here to speed things up a bit, shall we?

mirekphd
  • 4,799
  • 3
  • 38
  • 59
1

This option has been implemented.

Simply pass a value to tolerance:

    early_stop = xgb.callback.EarlyStopping(tolerance=1e5)

    booster = xgb.train(
        {'objective': 'binary:logistic',
         'eval_metric': ['error', 'rmse']},
        D_train,
        evals=[(D_train, 'Train'), (D_valid, 'Valid')],
        callbacks=[early_stop],
        )
MJimitater
  • 833
  • 3
  • 13
  • 26
  • I encountered the same issue. I worked with xgboost.XGBRegressor(), and did not figure out how to set the tolerance after studying all the answers in this post. My xgbregressor did not stop even the rmse grows very slowly like below. [585] validation_1-rmse:55221.87637 [586] validation_1-rmse:55221.58118 [587] validation_1-rmse:55220.68734 Can you please show me a sample code, for example, I want to only compare the integer part of RMSE ? – jean Apr 27 '23 at 19:15
  • I use import xgboost as xgb xgb.XGBRegressor() – jean Apr 27 '23 at 19:23
  • Im not sure if I understand. Your RMSE between step 585 and 586 has a difference of 0.2952 > 1e5.. the algorithm is not meant to early stop at this point yet.. – MJimitater Apr 27 '23 at 21:29
  • Thanks for your reply . For my data, 0.2925 difference is insignificant. I would like to round up RMSE to integer. So my tree can stop early. How can I do it ? – jean Apr 28 '23 at 04:22
  • well you just change the parameter tolerance to the minimal integer difference you would like, say `tolerance=1` – MJimitater Apr 28 '23 at 09:28
  • 1
    I am new to python and ML, so sorry to bother you with ruby questions. my callback did not work. import xgboost as xgb (version=1.7.3) early_stop=xgb.callback.EarlyStopping(tolerance=1e5) error: _init_() got an unexpected keyword argument 'tolerance' what did I miss? really appreciate it. – jean Apr 28 '23 at 15:37
  • If I [look here](https://github.com/dmlc/xgboost/pull/7137), the parameter name was changed to `min_delta`, please read for yourself and report back how you went – MJimitater May 01 '23 at 11:33