numpy and Global Interpreter Lock

Question

I am about to write some computationally-intensive Python code that'll almost certainly spend most of its time inside numpy's linear algebra functions.

The problem at hand is embarrassingly parallel. Long story short, the easiest way for me to take advantage of that would be by using multiple threads. The main barrier is almost certainly going to be the Global Interpreter Lock (GIL).

To help design this, it would be useful to have a mental model for which numpy operations can be expected to release the GIL for their duration. To this end, I'd appreciate any rules of thumb, dos and don'ts, pointers etc.

In case it matters, I'm using 64-bit Python 2.7.1 on Linux, with numpy 1.5.1 and scipy 0.9.0rc2, built with Intel MKL 10.3.1.

Have you considered using the [`multiprocessing`](http://docs.python.org/library/multiprocessing.html) lib instead of thread ? You wouldn't have to bother about GIL anymore. — Jeannot, Jun 01 '11 at 11:45
@Jeannot: I have, thanks. Due to the nature of the problem, threading is my first choice. If I can't make it work, I'll look at the alternatives. — NPE, Jun 01 '11 at 11:51

score 8 · Answer 1 · edited Jan 29 '21 at 15:17

8

Quite some numpy routines release GIL, so they can be efficiently parallel in threads (info). Maybe you don't need to do anything special!

You can use this question to find whether the routines you need are among the ones that release GIL. In short, search for ALLOW_THREADS or nogil in the source.

(Also note that MKL has the ability to use multiple threads for a routine, so that's another easy way to get parallelism, although possibly not the fastest kind).

edited Jan 29 '21 at 15:17

blambert

1,340
13
19

answered Nov 16 '16 at 11:08

Mark

18,730
7
107
130

2

info link is broken, but it was a web page with a copy of this stackoverflow answer: https://stackoverflow.com/a/36480941/1224627 – wingedsubmariner Feb 04 '20 at 22:29

score 6 · Accepted Answer · edited Nov 09 '15 at 11:48

6

You will probably find answers to all your questions regarding NumPy and parallel programming on the official wiki.

Also, have a look at this recipe page -- it contains example code on how to use NumPy with multiple threads.

edited Nov 09 '15 at 11:48

myaut

11,174
2
30
62

answered Jun 01 '11 at 12:30

Ferdinand Beyer

64,979
15
154
145

16

I had a look at the wiki page, and there is absolutely no information about which numpy functions do and do not release the GIL. – DanielSank Jun 04 '14 at 01:46
7

It would be nice if the answer actually included an answer, rather than just link which may or may not at some point in time reference a valid answer. https://meta.stackexchange.com/questions/8231/are-answers-that-just-contain-links-elsewhere-really-good-answers – zvone May 24 '18 at 15:40
1

from the wiki> During the print operations and the % formatting operation, no other thread can execute. But during the A = B + C, another thread can run - and if you've written your code in a numpy style, much of the calculation will be done in a few array operations like A = B + C. Thus you can actually get a speedup from using multiple threads. – joerick Feb 28 '20 at 13:07
The linked wiki page says that the GIL is released "while numpy is doing an array operation". For a more complete answer, see https://stackoverflow.com/questions/36479159. – mhsmith Jun 11 '20 at 00:11

score 5 · Answer 3 · edited Jun 01 '11 at 15:18

5

Embarrassingly parallel? Numpy? Sounds like a good candidate for PyCUDA or PyOpenCL.

edited Jun 01 '11 at 15:18

Ferdinand Beyer

64,979
15
154
145

answered Jun 01 '11 at 14:29

dwelch91

399
4
11

1

Doesn't sound like this is a good GPU problem since each thread will be doing linear algebra. There are GPU linear algebra packages though. A friend of mine has recently complied scipy using ACML-GPU's version of LAPACK. – kiyo Jun 02 '11 at 15:54
1

Theano might be a better choice for numpy-related applications. – Uli Köhler Dec 20 '15 at 03:58

numpy and Global Interpreter Lock

3 Answers3

Linked