This is not possible with the present implementation of xgboost (referring to versions 0.6 and 0.7).
Please be careful to the difference between native xgboost
xgboost.train(params, dtrain, num_boost_round=10, evals=(), obj=None,
feval=None, maximize=False, early_stopping_rounds=None, evals_result=None,
verbose_eval=True, xgb_model=None, callbacks=None, learning_rates=None)
or
xgboost.cv(params, dtrain, num_boost_round=10, nfold=3, stratified=False,
folds=None, metrics=(), obj=None, feval=None, maximize=False,
early_stopping_rounds=None, fpreproc=None, as_pandas=True, verbose_eval=None,
show_stdv=True, seed=0, callbacks=None, shuffle=True)
and the sklearn interface:
class xgboost.XGBRegressor(max_depth=3, learning_rate=0.1,
n_estimators=100, silent=True, objective='reg:linear', booster='gbtree',
n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0,
subsample=1, colsample_bytree=1, colsample_bylevel=1, reg_alpha=0,
reg_lambda=1, scale_pos_weight=1, base_score=0.5, random_state=0, seed=None,
missing=None, **kwargs)
as you can see there is not such a thing as early stop for xgboost.XGBRegressor
. Notice that the sklearn interface is the only one you can use in combination with GridSearchCV which requires a proper sklearn estimator with .fit(), .predict() etc...
You could pass you early_stopping_rounds
, and eval_set
as an extra fit_params to GridSearchCV, and that would actually work.
However, GridSearchCV
will not change the fit_params between the different folds, so you would end up using the same eval_set
in all the folds, which might not be what you mean by CV.
model=xgb.XGBClassifier()
clf = GridSearchCV(model, parameters,
fit_params={'early_stopping_rounds':20,\
'eval_set':[(X,y)]},cv=kfold)
After some tweaking, I found the safest way to integrate early_stopping_rounds
and the sklearn API is to implement an early_stopping mechanism your self. You can do it if you do a GridSearchCV
with n_rounds
as paramter to be tuned. You can then watch the mean_validation_score
for the different models with increasing n_rounds
. Then you can define a custom heuristic for early stop; you will notice that the default one is not optimal so to say.
I think it is also a better approach then using a single split hold-out for this purpose.