1

Could somebody please explain the xgboost.cv function from XGBoost's native interface?

Below is the first part of the demo from https://xgboost.readthedocs.io/en/stable/python/examples/cross_validation.html that illustrates my confusion.

import os
import numpy as np
import xgboost as xgb

# load data in do training
CURRENT_DIR = os.path.dirname(__file__)
dtrain = xgb.DMatrix(os.path.join(CURRENT_DIR, '../data/agaricus.txt.train'))
param = {'max_depth':2, 'eta':1, 'objective':'binary:logistic'}
num_round = 2

print('running cross validation')
# do cross validation, this will print result out as
# [iteration]  metric_name:mean_value+std_value
# std_value is standard deviation of the metric
xgb.cv(param, dtrain, num_round, nfold=5,
       metrics={'error'}, seed=0,
       callbacks=[xgb.callback.EvaluationMonitor(show_stdv=True)])

print('running cross validation, disable standard deviation display')
# do cross validation, this will print result out as
# [iteration]  metric_name:mean_value
res = xgb.cv(param, dtrain, num_boost_round=10, nfold=5,
             metrics={'error'}, seed=0,
             callbacks=[xgb.callback.EvaluationMonitor(show_stdv=False),
                        xgb.callback.EarlyStopping(3)])
print(res)

What is being optimized here? And why does param only contain a single value for each parameter, and not a set of parameters among which the CV finds the best one?

In general, when should this function be used?

wplo
  • 11
  • 1
  • 1
    Does this answer your question? [understanding python xgboost cv](https://stackoverflow.com/questions/34469038/understanding-python-xgboost-cv) – Ben Reiniger May 19 '22 at 17:56

0 Answers0