Early stopping with GridSearchCV - use hold-out set of CV for validation

Question

I want to employ the early-stopping-option in scikit-learns GridSearchCV-method. An example of this is shown in this SO-thread:

import xgboost as xgb
from sklearn.model_selection import GridSearchCV

trainX= [[1], [2], [3], [4], [5]]
trainY = [1, 2, 3, 4, 5]

testX = trainX 
testY = trainY

param_grid = {"subsample" : [0.5, 0.8],
              "n_estimators" : [600]}

fit_params = {"early_stopping_rounds":1,
             "eval_set" : [[testX, testY]]}


model = xgb.XGBRegressor()
gridsearch = GridSearchCV(estimator  = xgb.XGBRegressor(), 
                          param_grid=param_grid,
                          fit_params=fit_params,                          
                          verbose=1,                          
                          cv=2)
gridsearch.fit(trainX,trainY)

However, I would like to use as validation set the hold-out set of the cross-validation process. Is there a way to specify this in GridSearchCV?

People have been wondering for some time now about this; it is strange indeed that one cannot use the CV folds for this purpose https://stackoverflow.com/questions/43866284/grid-search-and-early-stopping-using-cross-validation-with-xgboost-in-scikit-lea — desertnaut, Apr 10 '18 at 17:45
Possible duplicate of [Grid Search and Early Stopping Using Cross Validation with XGBoost in SciKit-Learn](https://stackoverflow.com/questions/43866284/grid-search-and-early-stopping-using-cross-validation-with-xgboost-in-scikit-lea) — 00__00__00, Apr 13 '18 at 06:56

score 2 · Answer 1 · answered Apr 12 '18 at 14:50

This is not possible with the present implementation of xgboost (referring to versions 0.6 and 0.7). Please be careful to the difference between native xgboost

    xgboost.train(params, dtrain, num_boost_round=10, evals=(), obj=None, 
feval=None, maximize=False, early_stopping_rounds=None, evals_result=None, 
verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None)

or

xgboost.cv(params, dtrain, num_boost_round=10, nfold=3, stratified=False, 
folds=None, metrics=(), obj=None, feval=None, maximize=False, 
early_stopping_rounds=None, fpreproc=None, as_pandas=True, verbose_eval=None, 
show_stdv=True, seed=0, callbacks=None, shuffle=True)

and the sklearn interface:

    class xgboost.XGBRegressor(max_depth=3, learning_rate=0.1, 
n_estimators=100, silent=True, objective='reg:linear', booster='gbtree', 
n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, 
subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0, 
reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None, 
missing=None, **kwargs)

as you can see there is not such a thing as early stop for xgboost.XGBRegressor. Notice that the sklearn interface is the only one you can use in combination with GridSearchCV which requires a proper sklearn estimator with .fit(), .predict() etc...

You could pass you early_stopping_rounds, and eval_set as an extra fit_params to GridSearchCV, and that would actually work. However, GridSearchCV will not change the fit_params between the different folds, so you would end up using the same eval_set in all the folds, which might not be what you mean by CV.

model=xgb.XGBClassifier()
clf = GridSearchCV(model, parameters,
                         fit_params={'early_stopping_rounds':20,\
                         'eval_set':[(X,y)]},cv=kfold)

After some tweaking, I found the safest way to integrate early_stopping_rounds and the sklearn API is to implement an early_stopping mechanism your self. You can do it if you do a GridSearchCV with n_rounds as paramter to be tuned. You can then watch the mean_validation_score for the different models with increasing n_rounds. Then you can define a custom heuristic for early stop; you will notice that the default one is not optimal so to say.

I think it is also a better approach then using a single split hold-out for this purpose.

@00_00_00 . I asked the same question elsewhere. The early stopping is always done based on the supplied (X,y) for all cv folds. I guess it is ok, as long as the accuracy metrics are calculated on the held-out fold/dataset. This way will end up getting the correct "mean test" score and hence the best XGB. I don't understand what is inconsistent here. Can you please explain? The supplied eval_set(X,y) is only used to stop overfitting. Otherwise CV is carried out as intended, right? — Balki, Jan 01 '22 at 05:53

Eran Moshe · Answer 2 · 2018-04-12T15:08:45.240

Back in the days I've built a class, wrapping the package "HyperOpt" to suit my needs.

I'll try to quickly minimize it for you, so you can use it. Here's the code with some notes in the end, to help you solve your problem:

import numpy as np
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
import xgboost as xgb
max_float_digits = 4


def rounded(val):
    return '{:.{}f}'.format(val, max_float_digits)


class HyperOptTuner(object):
    """
    Tune my parameters!
    """
    def __init__(self, dtrain, dvalid, early_stopping=200, max_evals=200):
        self.counter = 0
        self.dtrain = dtrain
        self.dvalid = dvalid
        self.early_stopping = early_stopping
        self.max_evals = max_evals
        self.tuned_params = None


    def score(self, params):
        self.counter += 1
        # Edit params
        print("Iteration {}/{}".format(self.counter, self.max_evals))
        num_round = int(params['n_estimators'])
        del params['n_estimators']

        watchlist = [(self.dtrain, 'train'), (self.dvalid, 'eval')]
        model = xgb.train(params, self.dtrain, num_round, evals=watchlist, early_stopping_rounds=self.early_stopping,
                          verbose_eval=False)
        n_epoach = model.best_ntree_limit
        score = model.best_score
        params['n_estimators'] = n_epoach
        params = dict([(key, rounded(params[key]))
                       if type(params[key]) == float
                       else (key, params[key])
                       for key in params])

        print "Trained with: "
        print params
        print "\tScore {0}\n".format(score)
        return {'loss': 1 - score, 'status': STATUS_OK, 'params': params}

    def optimize(self, trials):
        space = {
            'n_estimators': 2000,  # hp.quniform('n_estimators', 10, 1000, 10),
            'eta': hp.quniform('eta', 0.025, 0.3, 0.025),
            'max_depth': hp.choice('max_depth', np.arange(1, 9, dtype=int)),
            'min_child_weight': hp.choice('min_child_weight', np.arange(1, 10, dtype=int)),
            'subsample': hp.quniform('subsample', 0.3, 1, 0.05),
            'gamma': hp.quniform('gamma', 0.1, 20, 0.1),
            'colsample_bytree': hp.quniform('colsample_bytree', 0.5, 1, 0.25),
            'eval_metric': 'map',
            'objective': 'rank:pairwise',
            'silent': 1
        }

        fmin(self.score, space, algo=tpe.suggest, trials=trials, max_evals=self.max_evals),

        min_loss = 1
        min_params = {}
        for trial in trials.trials:
            tmp_loss, tmp_params = trial['result']['loss'], trial['result']['params']
            if tmp_loss < min_loss:
                min_loss, min_params = tmp_loss, tmp_params

        print("Winning params:")
        print(min_params)
        print "\tScore: {}".format(1-min_loss)
        self.tuned_params = min_params

    def tune(self):
        print "Tuning...\n"
        # Trials object where the history of search will be stored
        trials = Trials()
        self.optimize(trials)

So I've used a class, mainly to define parameters and save results for farther usage. There are 2 mains functions.

optimize() created to define our "searching space", calculate the best parameters that minimizes the error (so do note that you are MINIMIZING an error) and saving the best parameters it has found. Also added some prints to help you follow the process.
score() created to calculate the score of a model using certain HyperParams from the "searching space". It's using early_stopping as defined in the Class. Since I didn't need to use cross validation I've used xgb.train(), but you can change it to xgb.cv() which does support early_stopping_rounds. Also added prints there to help you follow the process. score returns 1 - score (because I've calculated MAP which is an evaluation needed to be increased, so if you calculating an error like RMSE, just return score as is.)

This is how you activate it from your code, after having a dtrain and dtest matrices:

# dtrain is a training set of type DMatrix
# dtest is a testing set of type DMatrix
tuner = HyperOptTuner(dtrain=dtrain, dvalid=dtest, early_stopping=200, max_evals=400)
tuner.tune()

Where max_evals is the size of the "search grid"

Follow these guidelines and let me know if you're having trouble.

True, but he kindly asked for the hyperopt solution. And I think its a better solution which yields better results than GridSearchCV — Eran Moshe, Apr 12 '18 at 15:14
agree with that. does it outperforms randomizedgridsearch as well? — 00__00__00, Apr 12 '18 at 15:16
Yes. It also outperformed Bayesian optimization. But I've only checked it on my domain. So I do suggest to test it in other domains as well and not completely relay on 1 :> — Eran Moshe, Apr 12 '18 at 15:18

Early stopping with GridSearchCV - use hold-out set of CV for validation

2 Answers2