XGBoost training with n_jobs = -1 not using all cores

Question

I'm having problems using all cores on computer for training and cross-validation of XGBoost model.

Data:

data_dmatrix = xgb.DMatrix(data=X,label=y, nthread=-1)
dtrain = xgb.DMatrix(X_train, label=y_train, nthread=-1)
dtest = xgb.DMatrix(X_test, label=y_test, nthread=-1)

Model:

xg_model = XGBRegressor(objective='reg:linear', colsample_bytree= 0.3, learning_rate = 0.2,
                         max_depth = 5, alpha = 10, n_estimators = 100, subsample=0.4, booster = 'gbtree', n_jobs=-1)

and than if I do model training with:

xgb.train(
    xg_model.get_xgb_params(),
    dtrain,
    num_boost_round=500,
    evals=[(dtest, "Test")],
    early_stopping_rounds=200)

It works ok but it uses only 1 thread to run xgboost. Processor is on 25%. It ignores n_jobs=-1

But if I do cross-validation with scikit-learn implementation:

scores = cross_val_score(xg_model, X, y, cv=kfold, n_jobs=-1)

than it uses all cores. How can I force xgb.train and xgb.cv to use all cores?

score 1 · Answer 1 · answered Oct 11 '19 at 14:01

Boosting is an inherently sequential algorithm, you can only train tree t+1 after 1..t has been trained. For parallelization therefore, XGBoost "does the parallelization WITHIN a single tree", as noted here. With max_depth=5, your trees are comparatively very small, so parallelizing the tree building step isn't noticeable.

cross_val_score however is training K different XGBoost models parallelly. These models are completely independent from each other. From my experience, this sort of coarse grained parallelism using cross_val_score or GridSearchCV is always faster than parallelizing individual model.

One alternative is to use Random Forest variant: XGBRFClassifier. Unlike boosting algorithms, and like cross_val_score, random forest is embarrassingly parallel.

But I have regression case. Why would then I want to use XGBRFClassifier? — Hrvoje, Oct 15 '19 at 03:11
sry, there is [XGBRFRegressor](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBRFRegressor) — Shihab Shahriar Khan, Oct 15 '19 at 04:28

Bodhi · Accepted Answer · 2021-01-06T17:08:59.353

1

The constraint mentioned in the best answer selected does not seem to be a critical issue for XGBoost. I have also faced the issue that n_jobs = -1 is not working. Apparently, it seems to have to do with a known problem in XGBoost. See here.

When I set n_jobs to the number of threads I require, the usage of multiple cores happened. With n_jobs = 16, my training time now is reduced by nearly 10 times.

edited Jan 06 '21 at 17:08

answered Jan 06 '21 at 16:56

Bodhi

86
5

1

it seems like they solved the issue with the 1.3.0 version release. – Nihat May 31 '21 at 09:35

XGBoost training with n_jobs = -1 not using all cores

2 Answers2