0

I'm having problems using all cores on computer for training and cross-validation of XGBoost model.

Data:

data_dmatrix = xgb.DMatrix(data=X,label=y, nthread=-1)
dtrain = xgb.DMatrix(X_train, label=y_train, nthread=-1)
dtest = xgb.DMatrix(X_test, label=y_test, nthread=-1)

Model:

xg_model = XGBRegressor(objective='reg:linear', colsample_bytree= 0.3, learning_rate = 0.2,
                         max_depth = 5, alpha = 10, n_estimators = 100, subsample=0.4, booster = 'gbtree', n_jobs=-1)

and than if I do model training with:

xgb.train(
    xg_model.get_xgb_params(),
    dtrain,
    num_boost_round=500,
    evals=[(dtest, "Test")],
    early_stopping_rounds=200)

It works ok but it uses only 1 thread to run xgboost. Processor is on 25%. It ignores n_jobs=-1

But if I do cross-validation with scikit-learn implementation:

scores = cross_val_score(xg_model, X, y, cv=kfold, n_jobs=-1)

than it uses all cores. How can I force xgb.train and xgb.cv to use all cores?

Hrvoje
  • 13,566
  • 7
  • 90
  • 104

2 Answers2

1

Boosting is an inherently sequential algorithm, you can only train tree t+1 after 1..t has been trained. For parallelization therefore, XGBoost "does the parallelization WITHIN a single tree", as noted here. With max_depth=5, your trees are comparatively very small, so parallelizing the tree building step isn't noticeable.

cross_val_score however is training K different XGBoost models parallelly. These models are completely independent from each other. From my experience, this sort of coarse grained parallelism using cross_val_score or GridSearchCV is always faster than parallelizing individual model.

One alternative is to use Random Forest variant: XGBRFClassifier. Unlike boosting algorithms, and like cross_val_score, random forest is embarrassingly parallel.

Shihab Shahriar Khan
  • 4,930
  • 1
  • 18
  • 26
1

The constraint mentioned in the best answer selected does not seem to be a critical issue for XGBoost. I have also faced the issue that n_jobs = -1 is not working. Apparently, it seems to have to do with a known problem in XGBoost. See here.

When I set n_jobs to the number of threads I require, the usage of multiple cores happened. With n_jobs = 16, my training time now is reduced by nearly 10 times.

Bodhi
  • 86
  • 5