61

It seems that my numpy library is using 4 threads, and setting OMP_NUM_THREADS=1 does not stop this.

numpy.show_config() gives me these results:

atlas_threads_info:
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/lib64/atlas']
    define_macros = [('ATLAS_INFO', '"\\"3.8.4\\""')]
    language = f77
    include_dirs = ['/usr/include']
blas_opt_info:
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/lib64/atlas']
    define_macros = [('ATLAS_INFO', '"\\"3.8.4\\""')]
    language = c
    include_dirs = ['/usr/include']
atlas_blas_threads_info:
    libraries = ['ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/lib64/atlas']
    define_macros = [('ATLAS_INFO', '"\\"3.8.4\\""')]
    language = c
    include_dirs = ['/usr/include']
openblas_info:
  NOT AVAILABLE
lapack_opt_info:
    libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas']
    library_dirs = ['/usr/lib64/atlas']
    define_macros = [('ATLAS_INFO', '"\\"3.8.4\\""')]
    language = f77
    include_dirs = ['/usr/include']

So I know it is using blas, but I can't figure out how to make it use 1 thread for matrix multiplication.

drjrm3
  • 4,474
  • 10
  • 53
  • 91
  • 3
    [Atlas defines number of threads at compile time](http://math-atlas.sourceforge.net/faq.html#tnum) – jfs Jun 11 '15 at 23:15

5 Answers5

73

There are a few common multi CPU libraries that are used for numerical computations, including inside of NumPy. There are a few environment flags that you can set before running the script to limit the number of CPUS that they use.

Try setting all of the following:

export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OMP_NUM_THREADS=1

Sometimes it's a bit tricky to see where exactly multithreading is introduced.

Other answers show environment flags for other libraries. They may also work.

Jules G.M.
  • 3,624
  • 1
  • 21
  • 35
  • 11
    This needs more details. – Mohamad Elmasri Feb 03 '18 at 10:35
  • 1
    On Macs, `export VECLIB_MAXIMUM_THREADS=1`; see [performance-of-numpy-with-different-blas-implementations](https://stackoverflow.com/questions/26511430/performance-of-numpy-with-different-blas-implementations) on SO. Beware: `man Accelerate` says "the value of VECLIB_MAXIMUM_THREADS may be cached by the library and reused; if you need to ensure single-threaded execution, you should set VECLIB_MAXIMUM_THREADS before making any Accelerate calls" – denis Mar 13 '18 at 13:15
  • 1
    What is the best way to do this _from within the script_? Should I just run `os.system("export OMP_NUM_THREADS=1")`? I've read mixed reviews of whether or not that will work correctly, and even if it does, is that the best way to do this? It seems a little hacky for some reason. – seth127 Apr 26 '18 at 17:22
  • 32
    For anyone else who had the same question as me above, per this thread `http://numpy-discussion.10968.n7.nabble.com/Set-threads-from-within-python-code-td44108.html` you can do `os.environ["OMP_NUM_THREADS"] = "1"` etc but you have to put that *before* you have `import numpy`. Apparently numpy only checks for this at import. – seth127 Apr 30 '18 at 20:26
  • 3
    @seth127 You should make this an answer IMO, it's useful. – kηives Aug 14 '18 at 15:02
  • 1
    I agree. The comment by Seth should be an accepted answer – Mikhail Genkin Sep 11 '18 at 20:37
  • In 2022, ```os.environ['MKL_NUM_THREADS'] = '1' os.environ['NUMEXPR_NUM_THREADS'] = '1' os.environ['OMP_NUM_THREADS'] = '1'```works even if added after import `numpy` – Sahil Chimnani Mar 14 '22 at 16:59
62

There are more than the 3 mentioned environmental variables. The followings are the complete list of environmental variables and the package that uses that variable to control the number of threads it spawns. Note than you need to set these variables before doing import numpy:

OMP_NUM_THREADS: openmp,
OPENBLAS_NUM_THREADS: openblas,
MKL_NUM_THREADS: mkl,
VECLIB_MAXIMUM_THREADS: accelerate,
NUMEXPR_NUM_THREADS: numexpr

So in practice you can do:

import os
os.environ["OMP_NUM_THREADS"] = "4" # export OMP_NUM_THREADS=4
os.environ["OPENBLAS_NUM_THREADS"] = "4" # export OPENBLAS_NUM_THREADS=4 
os.environ["MKL_NUM_THREADS"] = "6" # export MKL_NUM_THREADS=6
os.environ["VECLIB_MAXIMUM_THREADS"] = "4" # export VECLIB_MAXIMUM_THREADS=4
os.environ["NUMEXPR_NUM_THREADS"] = "6" # export NUMEXPR_NUM_THREADS=6

Note that as of November 2018 the Numpy developers are working on making this possible to do after you do import numpy as well. I'll update this post once they commit those changes.

Amir
  • 10,600
  • 9
  • 48
  • 75
32

In regards to doing this from within a python script as opposed to at the bash prompt, per this thread you can do the following (same commands as the answer above):

import os
os.environ["MKL_NUM_THREADS"] = "1" 
os.environ["NUMEXPR_NUM_THREADS"] = "1" 
os.environ["OMP_NUM_THREADS"] = "1" 

but you have to put that before you do import numpy. Apparently numpy only checks for this at import.

(this is reposted as an answer based on @kηives comment above.)

seth127
  • 2,594
  • 5
  • 30
  • 43
  • 5
    I believe there's also OPENBLAS_NUM_THREADS if you've linked against OpenBLAS. – dga Sep 14 '18 at 18:01
  • Actually, this is being set not when you import numpy, but when you first invoke MKL-optimized Numpy code. For example, np.ones() does not trigger reading these variables, but np.dot() does. – Maxim Imakaev Nov 27 '18 at 15:35
17

After trying a number of the solutions above without luck, I found a reference to threadpoolctl in the Numpy docs. This worked and it can be used even if numpy is already imported.

with threadpool_limits(limits=1, user_api='blas'):
  # single threaded numpy code...

Just make sure to use the user_api which is listed when you do:

from threadpoolctl import threadpool_info
from pprint import pprint
import numpy
pprint(threadpool_info())
Josh Broomberg
  • 171
  • 1
  • 3
  • Is there any way to set this once, at the beginning of a notebook or script? I don't want to have to change my codebase to wrap parts of it in this context manager, and I don't want to try to wrap an entire script under this with statement (won't work with e.g. an iPython notebook split over multiple cells). – MRule Feb 03 '21 at 10:57
  • @MRule you are best off setting the environment varibles all to '1' before executing your script in that case (see accepted answer). – Gerard Jan 31 '23 at 16:32
3

I was able to fix this at run-time the following way:

import mkl
mkl.set_num_threads(1)

I use the following code to make this snippet less likely to cause problems in scripts/packages:

try:
    import mkl
    mkl.set_num_threads(1)
except:
    pass
The Unfun Cat
  • 29,987
  • 31
  • 114
  • 156