0

I am using XGBoost via the R package, and did not specify an nthread parameter (should default to the maximum number of available cores, which it does in Ubuntu).

On a Windows PC with an i7-4770 CPU (which has 4 cores = 8 threads), however, only max. 50% of the max CPU level is reached, even when I manually set nthread = 8 (The exact same code uses 100% of max CPU level under Ubuntu, so this is not an implementation issue I think). I also tried nthread = 4 which leads to around 30% of max CPU usage.

How do I get XGBoost to use all available threads under Windows?

user3825755
  • 883
  • 2
  • 10
  • 29
  • It does run on all of your cores, bit since the parallelization is done with cpp, it will not book 100% of your processor, like it would happen when parallelizing directly in R. – JacobJacox Nov 29 '19 at 12:19
  • @JacobJacox So in Linux it works because of different ways of handling parallel processing by the OS / cpp? – user3825755 Nov 29 '19 at 13:01
  • You ask me too much :) I noticed this when parallelizing rf by hand in r or writing my own in rcpp. – JacobJacox Nov 29 '19 at 13:37

1 Answers1

0

I've found that when installing the Windows XGBoost R package from CRAN via install.packages("xgboost") it does not have MPI support. Without MPI you will not get the full benefit of parallel processing and your CPUs will be under-utilised. You can confirm this in your scenario by using software like Dependency Walker on the xgboost.dll file—you will note that it doesn't link with any MPI library (usually vcomp140.dll on Windows).

The solution in my case was to uninstall the CRAN-supplied R package and build XGBoost and its R package from source, which was an adventure in itself, but did give me an MPI-enabled installation that pushed all 16 cores in my system to 100% utilisation.

(Edited for extra clarity)

lucasjb
  • 1
  • 2
  • 1
    This is a good observation. But what is your question? Are you asking how to ensure that XGBoost uses all cores? Also, since you are talking about hardware considerations, you should probably provide some details like: your version/implementation of R, your OS details, your CPU/GPU details, and anything else you think might be relevant. – Mike Williamson Apr 21 '21 at 17:13