8

I have some Cython code that I'd like to run as quickly as possible. Do I need to release the GIL in order to do this?

Let's suppose my code is similar to this:

import numpy as np

# trivial definition just for illustration!
cdef double some_complicated_function(double x) nogil:
    return x

cdef void func(double[:] input) nogil:
    cdef double[:] array = np.zeros_like(input)
    for i in range(array.shape[0]):
        array[i] = some_complicated_function(input[i])

I get a whole load of error messages from the np.zeros_like line similar to:

nogilcode.pyx:7:40: Calling gil-requiring function not allowed without gil
nogilcode.pyx:7:29: Accessing Python attribute not allowed without gil
nogilcode.pyx:7:27: Accessing Python global or builtin not allowed without gil
nogilcode.pyx:7:40: Constructing Python tuple not allowed without gil
nogilcode.pyx:7:41: Converting to Python object not allowed without gil

Do I need to find a way of calling np.zeros_like without the GIL? Or find some other way of allocating an array that doesn't require the GIL?


Note: this is a self-answered question designed to clear up some common misunderstanding about Cython and the GIL (although you're welcome to answer it too, of course!).

Second note: I've contributed enough to Cython that I should note it here (given that I'm bringing the topic up)

DavidW
  • 29,336
  • 6
  • 55
  • 86
  • You can use numpy's C-API from cython for a modest speed up compared to making the python call `zeros_like` and since you fill every value, you can make an `empty` array instead of a `zeros` array. – MaxNoe Jan 22 '21 at 09:32
  • 1
    Good point, that'd probably help a bit. I should say, this question was more about when nogil is beneficial in Cython than `np.zeros_like` specifically. Lots of people seem to start from the premise that everything needs to be `nogil` without having a real reason, and I wanted to write a good answer to refer then to. – DavidW Jan 22 '21 at 10:23
  • I'm having the same questions, i.e. using `np.sum(a_mem_view)` inside `cdef` function might not release GIL. DavidW you are such a hero and addressed so many cython questions, good job man! – avocado Aug 26 '21 at 23:08

1 Answers1

10

No - you probably don't need to release the GIL.

The basic function of the GIL (global interpreter lock) is to ensure that Python's internal mechanisms are not subject to race conditions, by ensuring that only one Python thread is able to run at once. However, simply holding the GIL does not slow your code down.

The two (related) occasions when you should release the GIL are:

  1. Using Cython's parallelism mechanism. The contents of a prange loop for example are required to be nogil.

  2. If you want other (external) Python threads to be able to run at the same time.

    a. if you have a large computationally/IO-intensive block that doesn't need the GIL then it may be "polite" to release it, just to benefit users of your code who want to do multi-threading. However, this is mostly useful rather than necessary.

    b. (very, very occasionally) it's sometimes useful to briefly release the GIL with a short with nogil: pass block. This is because Cython doesn't release it spontaneously (unlike Python) so if you're waiting on another Python thread to complete a task, this can avoid deadlocks. This sub-point probably doesn't apply to you unless you're compiling GUI code with Cython.


The sort of Cython code that can run without the GIL (no calls to Python, purely C-level numeric operations) is often the sort of code that runs efficiently. This sometimes gives people the impression that the inverse is true and the trick is releasing the GIL, rather than the actual code they're running. Don't be misled by this - your (single-threaded) code will run the same speed with or without the GIL.

Therefore, if you have a nice fast Numpy function that does exactly what you want quickly on a big chunk of data, but can only be called with the GIL, then just call it - no harm is done!


As a final point: even within a nogil block (for example a prange loop) you can always get the GIL back if you need it:

with gil:
    ... # small block of GIL requiring code goes here

Try not to do this too often (getting/releasing it takes time, and of course only one thread can be running this block at once) but equally it's a good way of doing small Python operations where needed.

DavidW
  • 29,336
  • 6
  • 55
  • 86
  • "This sometimes gives people the impression that the inverse is true and the trick is releasing the GIL, rather than the actual code they're running." How confident are you in that statement? I remember implementing tree structures, and at some point going to "nogil" in the most basic function calls (and nothing else) significantly sped up my code, which wasn't parallel or threaded by me or anything. It's been a while so I might be wrong. – oli Jun 09 '21 at 08:34
  • @Oli I'm struggling to think of a mechanism where it could make much difference so "fairly confident". However, I've been wrong about things before! If you have a counter-example I'd be interested to see it – DavidW Jun 09 '21 at 21:15
  • Hi DavidW, thanks for providing such detailed answer! One dumb question, Cython code using Numpy functions don't seem to be faster than plain Py Numpy version. My use-case is implementing some ML algorithm in Cython with Numpy functions, and I wanted to use Numpy broadcasting functions (e.g. `np.log`, `np.exp`, `np.max`), because they are quite handy to use. And by `%timeit`, the Cython version is almost same as Py version, is this expected? – avocado Aug 26 '21 at 23:34
  • 1
    @avocado yes that is expected. Cython cannot look inside Numpy functions to speed then up (and they are normally pretty fast internally so there is little to gain anyway) – DavidW Aug 27 '21 at 05:40
  • 1
    @oli: I'm basically 100% certain that a single-threaded Python program would gain nothing by using `nogil`. The GIL is basically free when uncontended (that's why they used a GIL; it was the easiest way to enable safe threading that didn't impose a penalty on single-threaded programs). The cost of releasing and reacquiring the GIL is pretty trivial if you're single threaded (acquiring/releasing uncontended locks is as close to free as you can get on most OSes), so using `nogil` when you're not threaded is harmless, but it would only speed you up by accident (aligning code better for caching). – ShadowRanger Nov 19 '21 at 18:48