6

I have a C extension that is called from my multithreaded Python application. I use a static variable i somewhere in a C function, and I have a few i++ statements later on that can be run from different Python threads (that variable is only used in my C code though, I don't yield it to Python).

For some reason I haven't met any race condition so far, but I wonder if it's just luck...

I don't have any thread-related C code (no Py_BEGIN_ALLOW_THREADS or anything).

I know that the GIL only guarantees single bytecode instructions to be atomic and thread-safe, thus statements as i+=1 in Python are not thread-safe.

But I don't know about a i++ instruction in a C extension. Any help ?

DenverCoder9
  • 151
  • 8
  • 2
    "I know that the GIL only guarantees single bytecode instructions to be atomic and thread-safe" - it doesn't even guarantee that. Your `i++` in C should be fine, though; the GIL can't be released in the middle of that. C code won't release the GIL unless it makes an explicit call to give other threads a chance to run (but be careful about calls to code you don't control, which might make that call for you). – user2357112 Feb 02 '17 at 15:44
  • Wow, I'm even more confused now. I read [here](http://stackoverflow.com/questions/1717393/is-the-operator-thread-safe-in-python) that single bytecode instructions are thread-safe... And what do you mean, C code will not ever release the GIL unless explicitly told to ? Like, even if I put a `sleep` or some wait/IO instruction ? Once you enter C code, it's just one single atomic execution ? – DenverCoder9 Feb 03 '17 at 10:03
  • 2
    Common misconception, but no, they're not threadsafe, most obviously because a single `BINARY_ADD` or whatever opcode could resolve to an arbitrary user-defined function written in Python. You have to be sure that executing the opcode can't lead to the invocation of other Python code, and that any C code involved won't explicitly release the GIL. – user2357112 Feb 03 '17 at 15:47

2 Answers2

4

Python will not release the GIL when you are running C code (unless you either tell it to or cause the execution of Python code - see the warning note at the bottom!). It only releases the GIL just before a bytecode instruction (not during) and from the interpreter's point of view running a C function is part of executing the CALL_FUNCTION bytecode.* (Unfortunately I can't find a reference for this paragraph currently, but I'm almost certain it's right)

Therefore, unless you do anything specific your C code will be the only thread running and thus any operation you do in it should be thread safe.

If you specifically want to release the GIL - for example because you're doing a long calculation which doesn't interfere with Python, reading from a file, or sleeping while waiting for something else to happen - then the easiest way is to do Py_BEGIN_ALLOW_THREADS then Py_END_ALLOW_THREADS when you want to get it back. During this block you cannot use most Python API functions and it's your responsibility to ensure thread safety in C. The easiest way to do this is to only use local variables and not read or write any global state.

If you've already got a C thread running without the GIL (thread A) then simply holding the GIL in thread B does not guarantee that thread A won't modify C global variables. To be safe you need to ensure that you never modify global state without some kind of locking mechanism (either the Python GIL or a C mechanism) in all your C functions.


Additional thought

* One place where the GIL can be released in C code is if the C code calls something that causes Python code to be executed. This might be through using PyObject_Call. A less obvious place would be if Py_DECREF caused a destructor to be executed. You'd have the GIL back by the time your C code resumed, but you could no longer guarantee that global objects were unchanged. This obvious doesn't affect simple C like x++.


Belated Edit:

It should be emphasised that it's really, really, really easy to cause the execution of Python code. For this reason you shouldn't use the GIL in place of a mutex or actual locking mechanism. You should only consider it for operations that are really atomic (i.e. a single C API call) or entirely on non-Python C objects. You won't lose the GIL unexpected while executing C Code, but a lot of C API calls may release the GIL, do something else, and then regain the GIL before returning to your C code.

The purpose the GIL is to make sure that the Python internals don't get corrupted. The GIL will continue to serve this purpose within an extension module. However race conditions that involve valid Python objects arranged in ways you don't expect are still available to you. For example:

PySequence_SetItem(some_list, 0, some_item);
PyObject* item = PySequence_GetItem(some_list, 0);
assert(item == some_item); // may not be true 
// the destructor of the previous contents of item 0 may have released the GIL
DavidW
  • 29,336
  • 6
  • 55
  • 86
  • That's pretty big, I didn't know that from the interpreter point of view, calls to C extensions functions were atomic. That means if you intend to spend some time in your C extension, you have to explicitly tell your code to release the GIL, or you won't be able to even let other threads make calls that wait for IO completion ? Thanks for the insight anyway. Is there any documentation about that ? I didn't find anything. – DenverCoder9 Feb 03 '17 at 13:00
  • The logic is that you need to hold the GIL to use any Python API call (it often segfaults if you don't). If it was just to release the GIL itself then you could never be sure anything was safe. Therefore it relies on you making your own judgement about when you don't need the GIL. While you have the GIL nothing new will start, but if other threads are _already_ waiting for IO completion then they will continue doing that in the background while your C function runs (but they won't do anything Pythony until you give up the GIL) – DavidW Feb 03 '17 at 13:18
  • I'm struggling a bit for good documentation that explicitly says that I'm afraid. If I find some I'll link it. It's easily tested though: set a bunch of threads running that print "Hello from thread A/B/C..." at regular intervals, then create another thread which calls a C function that goes to sleep for a minute. – DavidW Feb 03 '17 at 13:21
  • Here's a reference for the first paragraph: https://docs.python.org/3/faq/library.html#what-kinds-of-global-value-mutation-are-thread-safe "Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program." – rohitjv Jul 30 '21 at 21:52
0

Python C Extension Patterns has an excellent section covering Thread Safety.

Yes, this is an old post but it came up first in my google search so this answer may be useful to others.

philip
  • 131
  • 2
  • 4