5

Some versions/builds of numpy have multithreaded execution of certain operations. There are a number of questions on StackOverflow about how to enable this feature. In theory, it is great. However, I would like to disable it.

The reason is that I am running some numpy code in the context of a script that uses multiprocessing for parallelization. The default numpy multithreading does not seem very "smart", and each process will try to use all of the cores on my machine, which quickly overloads things if I have multiple processes running. (Also, this is a shared machine, so it is just rude behavior in general).

I am using the version of numpy that is currently installed by default using conda. Here is the information about the version of numpy that I end up with:

In [1]: import numpy

In [2]: numpy.__version__
Out[2]: '1.10.2'

In [3]: numpy.__config__.show()
lapack_opt_info:
    libraries = ['openblas']
    library_dirs = ['/home/mwaskom/anaconda/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
blas_opt_info:
    libraries = ['openblas']
    library_dirs = ['/home/mwaskom/anaconda/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
openblas_info:
    libraries = ['openblas']
    library_dirs = ['/home/mwaskom/anaconda/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
openblas_lapack_info:
    libraries = ['openblas']
    library_dirs = ['/home/mwaskom/anaconda/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
blas_mkl_info:
  NOT AVAILABLE

When numpy is complied with MKL, the number of threads can be controlled with an environment variable. That was the answer here. However, using the MKL builds through conda costs money (and the free academic option appears to have been discontinued). So I need to know how to control the multithreading behavior in the conda build shown above.

Ideally, there would be an environment variable, or some other option that lets me select the number of threads to use depending on what I am doing. Alternatively, is there a way to use conda to install a version of numpy that will not multithread?

Community
  • 1
  • 1
mwaskom
  • 46,693
  • 16
  • 125
  • 127

1 Answers1

8

Turns out that multithreading is controlled through the OPENBLAS_NUM_THREADS environment variable, so setting that to 1 will keep things in serial.

mwaskom
  • 46,693
  • 16
  • 125
  • 127
  • Note: if anyone is using the Math Kernel Library (MKL), they would need to set MKL_NUM_THREADS instead. See summary in https://stackoverflow.com/questions/74429606/python-script-on-pbs-fails-with-error-pbs-job-killed-ncpus-37-94-exceeded – Yair Daon Nov 14 '22 at 12:49