64

I have to run jobs on a regular basis on compute servers that I share with others in the department and when I start 10 jobs, I really would like it to just take 10 cores and not more; I don't care if it takes a bit longer with a single core per run: I just don't want it to encroach on the others' territory, which would require me to renice the jobs and so on. I just want to have 10 solid cores and that's all.

I am using Enthought 7.3-1 on Redhat, which is based on Python 2.7.3 and numpy 1.6.1, but the question is more general.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
MasDaddy
  • 643
  • 1
  • 5
  • 6
  • 3
    I'm pretty sure that numpy doesn't do any multithreading, there is nothing to switch off. – Winston Ewert Jun 11 '13 at 20:57
  • 2
    set cpu affinity for the processes – jfs Jun 11 '13 at 20:59
  • 21
    @WinstonEwert: incorrect. Try `np.dot` with large matrix on multicore cpu. The libraries that it uses may utilize more than one cpu – jfs Jun 11 '13 at 20:59
  • 1
    Thanks a lot. Now that I know what to search for, I found this other page that seems to answer my question: http://stackoverflow.com/questions/1575067/python-multiprocessing-restrict-number-of-cores-used – MasDaddy Jun 11 '13 at 21:14

5 Answers5

56

Only hopefully this fixes all scenarios and system you may be on.

  1. Use numpy.__config__.show() to see if you are using OpenBLAS or MKL

From this point on there are a few ways you can do this.

2.1. The terminal route export OPENBLAS_NUM_THREADS=1 or export MKL_NUM_THREADS=1

2.2 (This is my preferred way) In your python script import os and add the line os.environ['OPENBLAS_NUM_THREADS'] = '1' or os.environ['MKL_NUM_THREADS'] = '1'.

NOTE when setting os.environ[VAR] the number of threads must be a string! Also, you may need to set this environment variable before importing numpy/scipy.

There are probably other options besides openBLAS or MKL but step 1 will help you figure that out.

dcneuro
  • 181
  • 2
  • 6
SARose
  • 3,558
  • 5
  • 39
  • 49
  • 3
    Amazing. When trying to parallelize a batch of fftpack+odeint simulations with multiprocessing.Pool, this gave me an up to *600x speedup*! Yet, the speedup was in effect even before I came to the multiprocessing parts of my notebook. Something about thos OpenBLAS threads was apparently blocking proper vectorization. – tsbertalan Apr 17 '19 at 21:22
  • 1
    Another amazing thing was that the utilization reported by htop was actually lower after the change, so it really seems like some different code path must be used; perhaps something that makes better use of my Xeon E5-1660v4's vector extensions. – tsbertalan Apr 17 '19 at 21:23
41

Set the MKL_NUM_THREADS environment variable to 1. As you might have guessed, this environment variable controls the behavior of the Math Kernel Library which is included as part of Enthought's numpy build.

I just do this in my startup file, .bash_profile, with export MKL_NUM_THREADS=1. You should also be able to do it from inside your script to have it be process specific.

Bi Rico
  • 25,283
  • 3
  • 52
  • 75
21

In case you want to set the number of threads dynamically, and not globally via an environment variable, you can also do:

import mkl
mkl.set_num_threads(2)
knub
  • 3,892
  • 8
  • 38
  • 63
12

In more recent versions of numpy I have found it necessary to also set NUMEXPR_NUM_THREADS=1.

In my hands, this is sufficient without setting MKL_NUM_THREADS=1, but under some circumstances you may need to set both.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
R_Beagrie
  • 583
  • 5
  • 10
  • 4
    Fwiw, I needed to set `OMP_NUM_THREADS` (cf https://stackoverflow.com/a/31622299/1666398) – dtk Jun 08 '17 at 21:14
-9

For me, the solution was simple as I stopped using numpy.dot:

import numpy as np

a = np.random.rand(1e6)
b = np.random.rand(1e6, 10)

# potentially uses multiple threads
dotted = np.dot(a, b)

# single-thread
summed = np.sum(a[:, np.newaxis] * b, axis=0)

assert np.all(dotted == summed)