130

I'm slightly confused about whether multithreading works in Python or not.

I know there has been a lot of questions about this and I've read many of them, but I'm still confused. I know from my own experience and have seen others post their own answers and examples here on StackOverflow that multithreading is indeed possible in Python. So why is it that everyone keep saying that Python is locked by the GIL and that only one thread can run at a time? It clearly does work. Or is there some distinction I'm not getting here?

Many posters/respondents also keep mentioning that threading is limited because it does not make use of multiple cores. But I would say they are still useful because they do work simultaneously and thus get the combined workload done faster. I mean why would there even be a Python thread module otherwise?

Update:

Thanks for all the answers so far. The way I understand it is that multithreading will only run in parallel for some IO tasks, but can only run one at a time for CPU-bound multiple core tasks.

I'm not entirely sure what this means for me in practical terms, so I'll just give an example of the kind of task I'd like to multithread. For instance, let's say I want to loop through a very long list of strings and I want to do some basic string operations on each list item. If I split up the list, send each sublist to be processed by my loop/string code in a new thread, and send the results back in a queue, will these workloads run roughly at the same time? Most importantly will this theoretically speed up the time it takes to run the script?

Another example might be if I can render and save four different pictures using PIL in four different threads, and have this be faster than processing the pictures one by one after each other? I guess this speed-component is what I'm really wondering about rather than what the correct terminology is.

I also know about the multiprocessing module but my main interest right now is for small-to-medium task loads (10-30 secs) and so I think multithreading will be more appropriate because subprocesses can be slow to initiate.

Laurel
  • 5,965
  • 14
  • 31
  • 57
Karim Bahgat
  • 2,781
  • 3
  • 21
  • 27
  • 5
    This is a pretty loaded question. I think the answer lies in *what* you want to have the threads do. Under most circumstances the GIL prevents more than 1 thread from running simultaneously. However, there are a few cases where the GIL is released (e.g. reading from a file) so that can be done in parallel. Also note that the GIL is an *implementation detail* of Cpython (the most common implementation). No other implementation of python (Jython, PyPy, etc) has a GIL (AFAIK) – mgilson Jan 05 '14 at 21:23
  • 4
    @mgilson PyPy has a GIL. –  Jan 05 '14 at 21:30
  • 3
    @delnan -- You appear to be correct. Thanks. – mgilson Jan 05 '14 at 21:39
  • 1
    "subprocesses can be slow to initiate" -- you could create a pool of tasks ready to execute. The overhead can be limited to roughly the amount of time it takes to serialize/deserialize the data required for the task to start working. – Brian Cain Jan 06 '14 at 13:53
  • Processes take some time to start, but not in the order of seconds. The time to be pickled, sent and unpickled is a more likely bottleneck, but even this is hard to tell without trying. –  Jan 06 '14 at 14:11
  • I might have exagerated when I said processes can take seconds to start. I just tried again with a print statement after running each p.start() inside a for loop, with about 0.5 seconds lag in between each, so not as bad as I thought but still noticeable for very small tasks. Maybe Ill reconsider multiprocessing. – Karim Bahgat Jan 06 '14 at 16:46
  • @BrianCain How would I make a pool of processes ready to execute like you are suggesting? Do you mean that I create the subprocesses before they are needed and having them wait in a while-loop for instructions/data to be sent through eg a queue object after which they immediately exit their while-loop and begin their work? – Karim Bahgat Jan 06 '14 at 16:51
  • 1
    @KarimBahgat, that's exactly what I mean. – Brian Cain Jan 06 '14 at 18:33
  • @delnan: I believe PyPy had an experimental no-GIL branch at one point, but it was canceled. Partly because the GC is a lot simpler with a GIL, partly because PyPy explicitly wants to run all CPython code, even code that incorrectly assumes the GIL. The future is the new STM branch, which will be able to simulate a GIL (and even let Python code release it the same way C code does, if that's useful to your semantics) without actually having one (so your code runs in parallel). See the [STM project page](http://pypy.org/tmdonate2.html), and the links from there, for updates. – abarnert Aug 11 '14 at 07:15
  • @KarimBahgat: Instead of building a pool yourself, you can (and usually should) use the ones built into the stdlib, `multiprocessing.Pool` and `concurrent.futures.ProcessPoolExecutor`. (The same goes for threads, but with `multiprocessing.dummy.Pool` and `concurrent.futures.ThreadPoolExecutor`.) There are a lot of wibbly things to get right with a thread pool, and a lot of things you can build on top of it to make it easier to use, and all that work has been done for you, so use the included batteries. – abarnert Aug 11 '14 at 07:17
  • Don't put answers in the question. – jonrsharpe Nov 18 '14 at 10:59
  • Have look at this https://www.toptal.com/python/beginners-guide-to-concurrency-and-parallelism-in-python – Irfan Ashraf Oct 02 '18 at 05:14

4 Answers4

172

The GIL does not prevent threading. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads.

What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.

This only applies to Python code. C extensions can and do release the GIL to allow multiple threads of C code and one Python thread to run across multiple cores. This extends to I/O controlled by the kernel, such as select() calls for socket reads and writes, making Python handle network events reasonably efficiently in a multi-threaded multi-core setup.

What many server deployments then do, is run more than one Python process, to let the OS handle the scheduling between processes to utilize your CPU cores to the max. You can also use the multiprocessing library to handle parallel processing across multiple processes from one codebase and parent process, if that suits your use cases.

Note that the GIL is only applicable to the CPython implementation; Jython and IronPython use a different threading implementation (the native Java VM and .NET common runtime threads respectively).

To address your update directly: Any task that tries to get a speed boost from parallel execution, using pure Python code, will not see a speed-up as threaded Python code is locked to one thread executing at a time. If you mix in C extensions and I/O, however (such as PIL or numpy operations) and any C code can run in parallel with one active Python thread.

Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thanks @MartijnPieters, then I have a clearer answer to my question of whether threading can be used to speed up code such as a for-loop, which is "no". Maybe you or someone could write a new answer that I can accept that provides some specific examples of common modules/codes/operations where threading will be allowed by the GIL to run paralell and thus faster (eg examples of those I/O and network/socket read operations that have been mentioned, and any other cases where multithreading in Python is useful). Maybe a nice list of common multithread uses and some programming examples if posible? – Karim Bahgat Jan 06 '14 at 17:15
  • 5
    No, I don't think that such an answer would be very helpful; to be honest. You cannot create an exhaustive list, ever, but the rule-of-thumb is that any I/O (file reading and writing, network sockets, pipes) is handled in C, and a lot of C libraries also release the GIL for their operations, but it is up to the libraries to document this for you. – Martijn Pieters Jan 06 '14 at 17:19
  • 2
    My bad, didnt see your updated answer until now, where you gave some nice examples of thread usage. These included **(correct me if Im wrong)** network programming (eg `urllib.urlopen()`?), to call one Python script from within a Python GUI, and calling multiple PIL (eg `Image.transform()`) and numpy (eg `numpy.array()`) operations with threads. And you provided some more examples in your comment such as using multiple threads to read files (eg `f.read()`?). I know an exhaustive list isnt possible, just wanted the types of examples you gave in your update. Either way, accepted your answer :) – Karim Bahgat Jan 06 '14 at 17:50
  • 3
    @KarimBahgat: Yes, `urllib.urlopen()` would invoke network sockets, waiting for socket I/O is an excellent opportunity to switch threads and do something else. – Martijn Pieters Jan 06 '14 at 17:52
  • 5
    Although it's not directly relevant to this problem, it's worth noting that sometimes threading isn't about performance at all; it may just be simpler to write your code as multiple independent threads of execution. For example, you may have one thread playing background music, one servicing the UI, and one chugging away on computations that have to be done eventually but aren't in any rush. Trying to sequence playing the next audio buffer with the UI runloop, or break down your computation into small enough pieces to not interfere with interactivity, may be a lot harder than using threads. – abarnert Aug 11 '14 at 07:10
  • @MartijnPieters I was under the impression that ALL IO Bound Threads can be parallelized. Like reading from 2 websockets at the same time or reading from two files at the same time. Are you suggesting that depending on the underlying implementation they wont be parallelized? – vi_ral Jun 27 '20 at 18:30
  • The operations like networks calls, files IO also be done in background by using async programming, then what is advantage of multithreading in python over asynchronous programming? @Martijn Pieters – Rahul Palve Jan 02 '22 at 16:17
  • @RahulPalve: async is great for I/O bound problems, threads for everything else. – Martijn Pieters Jan 19 '22 at 22:12
7

Yes. :)

You have the low level thread module and the higher level threading module. But if you simply want to use multicore machines, the multiprocessing module is the way to go.

Quote from the docs:

In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

Guy Avraham
  • 3,482
  • 3
  • 38
  • 50
zord
  • 4,538
  • 2
  • 25
  • 30
  • It's been a while but since I was just looking for it; to be 100% you should always check the cpython source code for Py_BEGIN_ALLOW_THREADS around code. In my case I wanted to be 100% that Process.Join didn't hold the GIL of the caller and in cpython::Modules::_winapi.c (for example) the wait function (which join uses) wraps this macro around the actual WaitForMultipleObjects. – SonarJetLens Apr 08 '22 at 07:57
5

Threading is Allowed in Python, the only problem is that the GIL will make sure that just one thread is executed at a time (no parallelism).

So basically if you want to multi-thread the code to speed up calculation it won't speed it up as just one thread is executed at a time, but if you use it to interact with a database for example it will.

r.guerbab
  • 249
  • 3
  • 6
0

I feel for the poster because the answer is invariably "it depends what you want to do". However parallel speed up in python has always been terrible in my experience even for multiprocessing.

For example check this tutorial out (second to top result in google): https://www.machinelearningplus.com/python/parallel-processing-python/

I put timings around this code and increased the number of processes (2,4,8,16) for the pool map function and got the following bad timings:

serial 70.8921644706279 
parallel 93.49704207479954 tasks 2
parallel 56.02441442012787 tasks 4
parallel 51.026168536394835 tasks 8
parallel 39.18044807203114 tasks 16

code: # increase array size at the start # my compute node has 40 CPUs so I've got plenty to spare here

arr = np.random.randint(0, 10, size=[2000000, 600])
.... more code ....
tasks = [2,4,8,16]

for task in tasks:
    tic = time.perf_counter()
    pool = mp.Pool(task)

    results = pool.map(howmany_within_range_rowonly, [row for row in data])

    pool.close()
    toc = time.perf_counter()
    time1 = toc - tic
    print(f"parallel {time1} tasks {task}")