311

I found that in Python 3.4, there are few different libraries for multiprocessing/threading: multiprocessing vs threading vs asyncio.

But I don't know which one to use or is the "recommended one". Do they do the same thing, or are different? If so, which one is used for what? I want to write a program that uses multicores in my computer. But I don't know which library I should learn.

Super Kai - Kazuya Ito
  • 22,221
  • 10
  • 124
  • 129
user3654650
  • 5,283
  • 10
  • 27
  • 28
  • 4
    Maybe [I’m too stupid for AsyncIO](https://whatisjasongoldstein.com/writing/im-too-stupid-for-asyncio/) helps – Martin Thoma Apr 23 '18 at 09:01
  • I'm too stupid for AsyncIO site was retired, and can still be found at: https://web.archive.org/web/20210801000000*/https://whatisjasongoldstein.com/writing/im-too-stupid-for-asyncio/ Also read the excellent response https://medium.com/@pgjones/understanding-asyncio-a6592a517def – bluppfisk Aug 25 '22 at 16:48
  • 1
    Working link to archive.org [I'm too stupid for AsyncIO](https://web.archive.org/web/20210917225127/https://whatisjasongoldstein.com/writing/im-too-stupid-for-asyncio/) – radioxoma Mar 06 '23 at 22:09

10 Answers10

289

TL;DR

Making the Right Choice:

We have walked through the most popular forms of concurrency. But the question remains - when should choose which one? It really depends on the use cases. From my experience (and reading), I tend to follow this pseudo code:

if io_bound:
    if io_very_slow:
        print("Use Asyncio")
    else:
        print("Use Threads")
else:
    print("Multi Processing")
  • CPU Bound => Multi Processing
  • I/O Bound, Fast I/O, Limited Number of Connections => Multi Threading
  • I/O Bound, Slow I/O, Many connections => Asyncio

Reference


[NOTE]:

  • If you have a long call method (e.g. a method containing a sleep time or lazy I/O), the best choice is asyncio, Twisted or Tornado approach (coroutine methods), that works with a single thread as concurrency.
  • asyncio works on Python3.4 and later.
  • Tornado and Twisted are ready since Python2.7
  • uvloop is ultra fast asyncio event loop (uvloop makes asyncio 2-4x faster).

[UPDATE (2019)]:

  • Japranto (GitHub) is a very fast pipelining HTTP server based on uvloop.
Benyamin Jafari
  • 27,880
  • 26
  • 135
  • 150
  • 3
    So if I have a list of urls to request, it's better to use *Asyncio*? – mingchau Jul 29 '19 at 13:00
  • 5
    @mingchau, Yes, but keep in mind, you could use from `asyncio` when you use from awaitable functions, `request` library is not an awaitable method, instead of that you can use such as the [`aiohttp` library](https://pypi.org/project/aiohttp-requests/) or [async-request](https://pypi.org/project/requests-async/) and etc. – Benyamin Jafari Jul 29 '19 at 14:18
  • 3
    please extend on slowIO and fastIO to go multithread or asyncio>? – droid192 Sep 04 '19 at 17:26
  • @qrtLs When you have a SlowIO, AsyncIO is very helpful and more efficient. – Benyamin Jafari Sep 05 '19 at 07:23
  • 2
    Please can you advise what exactly is io_very_slow – variable Nov 06 '19 at 07:04
  • What is an example of io_very_slow ? – variable Nov 06 '19 at 07:33
  • 14
    @variable I/O bound means your program spends most of its time talking to a slow device, like a network connection, a hard drive, a printer, or an event loop with a sleep time. So in blocking mode, you could choose between threading or asyncio, and if your bounding section is very slow, cooperative multitasking (asyncio) is a better choice (i.e. avoiding to resource starvation, dead-locks, and race conditions) – Benyamin Jafari Nov 07 '19 at 19:19
  • 1
    As always, there are of course exceptions to such rules. One example is if you need to run a non-trivial subprocess. The `subprocess` module and its `Popen` class use a busy-loop while waiting for the subprocess to complete, while `asyncio.create_subprocess_exec()` and friends use the loop's poller. So `asyncio` can be better for such use-cases too. – hadriel Jun 17 '22 at 23:22
  • @BenyaminJafari-aGn. Thank you. Suppose we have a server that receives 10 frames per second to detect faces and return back the result to client, which approach would you recommend in this case please? – Avv Nov 08 '22 at 17:49
  • @hadriel. So, you think asyncio is better if we have a subprocess (running C function using Ctypes) than just using threading? Can you please elaborate on why if that is what you meant? – Avv Nov 08 '22 at 17:52
156

They are intended for (slightly) different purposes and/or requirements. CPython (a typical, mainline Python implementation) still has the global interpreter lock so a multi-threaded application (a standard way to implement parallel processing nowadays) is suboptimal. That's why multiprocessing may be preferred over threading. But not every problem may be effectively split into [almost independent] pieces, so there may be a need in heavy interprocess communications. That's why multiprocessing may not be preferred over threading in general.

asyncio (this technique is available not only in Python, other languages and/or frameworks also have it, e.g. Boost.ASIO) is a method to effectively handle a lot of I/O operations from many simultaneous sources w/o need of parallel code execution. So it's just a solution (a good one indeed!) for a particular task, not for parallel processing in general.

Rafa Viotti
  • 9,998
  • 4
  • 42
  • 62
user3159253
  • 16,836
  • 3
  • 30
  • 56
  • 18
    Noting that while all three may not achieve parallelism, they are all capable of doing concurrent (non-blocking) tasks. – sargas Aug 11 '15 at 18:03
  • Thank you. Can you elaborate more on what does "handle a lot of I/O operations from many simultaneous sources w/o need of parallel code execution" mean, please? – Avv Nov 08 '22 at 18:46
  • Please take a look a [aiohttp](https://docs.aiohttp.org/en/stable/) package (a tutorial is [here](https://www.twilio.com/blog/asynchronous-http-requests-in-python-with-aiohttp)) to get an idea how to handle multiple I/O operations without using (many) threads. In fact, aiohttp is based on coroutines which were fully introduced in Python after the answer was created (the first version of [asyncio](https://docs.python.org/3.4/library/asyncio.html) library first appeared in 3.4 python and fully shaped in 3.5, in late 2015... – user3159253 Nov 09 '22 at 21:44
  • 1
    ... However similar techniques had been available earlier since at least late 90s/early 2000s, long before coroutines became mainstream in the Python-based development, e.g. in [Twisted](https://twisted.org/). The idea is to build an application around an event loop, use non-blocking I/O operations, perform all short activities quickly inside a main thread (or in a limited thread pool) and put long lasting operations such as heavy calculations to external workers, collecting their results in non-blocking manner... – user3159253 Nov 09 '22 at 22:01
91

In multiprocessing you leverage multiple CPUs to distribute your calculations. Since each of the CPUs runs in parallel, you're effectively able to run multiple tasks simultaneously. You would want to use multiprocessing for CPU-bound tasks. An example would be trying to calculate a sum of all elements of a huge list. If your machine has 8 cores, you can "cut" the list into 8 smaller lists and calculate the sum of each of those lists separately on separate core and then just add up those numbers. You'll get a ~8x speedup by doing that.

In (multi)threading you don't need multiple CPUs. Imagine a program that sends lots of HTTP requests to the web. If you used a single-threaded program, it would stop the execution (block) at each request, wait for a response, and then continue once received a response. The problem here is that your CPU isn't really doing work while waiting for some external server to do the job; it could have actually done some useful work in the meantime! The fix is to use threads - you can create many of them, each responsible for requesting some content from the web. The nice thing about threads is that, even if they run on one CPU, the CPU from time to time "freezes" the execution of one thread and jumps to executing the other one (it's called context switching and it happens constantly at non-deterministic intervals). So if your task is I/O bound - use threading.

asyncio is essentially threading where not the CPU but you, as a programmer (or actually your application), decide where and when does the context switch happen. In Python you use an await keyword to suspend the execution of your coroutine (defined using async keyword).

Tomasz Bartkowiak
  • 12,154
  • 4
  • 57
  • 62
  • If I have multiple threads and then I start getting the responses faster - and after the responses my work is more CPU bound - would my process use the multiple cores? That is, would it freeze threads instead of also using the multiple cores? – aspiring1 Oct 29 '20 at 04:38
  • 1
    Not sure if I understood the question. Is it about whether you should use multiple cores when responses become faster? If that's the case - it depends how fast the responses are and how much time you really spend waiting for them vs. using CPU. If you're spending majority of time doing CPU-intensive tasks then it'd be beneficial to distribute over multiple cores (if possible). And if the question if whether the system would spontaneously switch to parallel processing after "realizing" its job is CPU-bound - I don't think so - usually you need to tell it explicitly to do so. – Tomasz Bartkowiak Oct 29 '20 at 09:18
  • I was thinking of a chatbot application, in which the chatbot messages by users is sent to the server and the responses are sent back by the server using a POST request? Do you think is this more of a CPU intensive task, since the response sent & received can be json, but I was doubtful - what would happen if the user takes time to type his response, is this an example of slow I/O? (user sending response late) – aspiring1 Oct 29 '20 at 10:49
  • @TomaszBartkowiak Hi, I have a question: So I have a realtime facial-recongnition model that takes in input from a webcam and shows whether a user is present or not. There is an obvious lag because all the frames are not processed in real-time as the processesing rate is slower. Can you tell me if multi-threading can help me here if I create like 10 threads to process 10 frames rather than processing those 10 frames on one thread? And just to clarify, by processing I mean, there is a trained model on keras that takes in an image frame as an input and outputs if a person is detected or not. – Talal Zahid Jul 04 '21 at 12:14
  • 1
    @TalalZahid your task seems to be CPU bound - it's only the machine (CPU) that performs inference (detection), as opposed to waiting for IO or someone else to do some part of the work (i.e. calling external API). So it would not make sense to do multithreading. If processing a given frame takes a considerable amount of time (does it?) and each frame is independent then you might consider distributing detection across separate machines/core. – Tomasz Bartkowiak Jul 05 '21 at 15:47
  • 14
    I like how you mention that developers control the context switch in `async` but the OS controls it in `threading` – Arkyo Nov 07 '21 at 02:34
  • Regarding threading, why is it good for lots of HTTP requests? i.e. I've 3 requests and 3 threads, each time CPU polls one of the threads and does a small progress in getting the data how is it better than getting all the data at once from each request one by one? I can rephrase the question, when a sleeping thread sends a web request, does he get answer even if CPU runs another thread? if so how is it possible? – TheLogicGuy Jan 14 '22 at 10:15
  • @TheLogicGuy Once the request is sent, without threading the (sequential) program freezes until it gets a response from an external server (note that this includes: time for the network to send the request out, time to do compute by an external server and time for network to deliver back the response). During that time you could have switched to other thread so that your CPU isn't idle. Re _does he get answer even if CPU runs another thread_ - the thread (after "waking up") might be e.g. checking some message queue if there is a response for a request it had sent before yielding control back. – Tomasz Bartkowiak Jan 14 '22 at 10:33
  • @TomaszBartkowiak so if I understand, this message queue that manages the responses must be on a different process so when each thread wakes up it checks if its request was finished in that queue that was managed by someone else that was awake all that time? – TheLogicGuy Jan 14 '22 at 10:47
  • Typically yes but I presume there could be other ways that don't require a queue running in a separate process but e.g. some dedicated shared memory buffer that could be written to by some other process via IPC). In some other cases you can have e.g. a distributed queue (e.g. `kafka`) which not only runs in a separate process but in a different container (or a different virtual machine). – Tomasz Bartkowiak Jan 14 '22 at 10:56
  • @TomaszBartkowiak. What if the frames are dependent on each other, please? I am commenting on your response to TalalZahid – Avv Nov 08 '22 at 18:16
  • 1
    @Arkyo well technically OS controls all context switching, but with threading OS implements preemptive multitasking, so it decides on its own (based on various algorithms) when to switch, while with asyncio OS implements cooperative multitasking so it cooperates with a running thread and switches context when the thread signals that it is ready to yield control of the cpu. – ruslaniv Feb 14 '23 at 10:25
45

This is the basic idea:

Is it IO-BOUND ? -----------> USE asyncio

IS IT CPU-HEAVY ? ---------> USE multiprocessing

ELSE ? ----------------------> USE threading

So basically stick to threading unless you have IO/CPU problems.

Farshid Ashouri
  • 16,143
  • 7
  • 52
  • 66
  • 13
    what is the 3rd problem you could have? – EralpB Dec 23 '21 at 13:11
  • 3
    @EralpB Not io or CPU bound, like a thread worker doing simple calculation or reading chunks of data locally or from a fast local database. Or just sleeping and watching something. Basically, most problems fall into this criteria unless you have a networking application or a heavy calculation. – Farshid Ashouri Dec 24 '21 at 00:52
  • 1
    Well if you are doing any type of calculation then this is an CPU bound problem so you should use multiprocessing and if is simple you may not need any concurrency solution. In the case of read data locally or from a database, well this is an IO bound problem, so either threading or asyncio could help you. The main difference between the two is that in asyncio you have more control than threading and threading has a initialization cost to your program, so if you plan to use a lot of threads maybe asyncio will suit better to you. I don't think that we have another type of problem besides those – CAIO WANDERLEY Jun 03 '22 at 13:38
  • I am a bit lost with the concept of IO in threading. What are examples of IO probems? – Arjuna Deva Jun 22 '22 at 20:15
  • @FarshidAshouri. So a server receives frames to detect faces and send result back to client does not fit into "threading" category please? – Avv Nov 08 '22 at 18:49
  • @ArjunaDeva. An example is waiting for a request from an internet connection. – Avv Nov 08 '22 at 18:50
36

Many of the answers suggest how to choose only 1 option, but why not be able to use all 3? In this answer I explain how you can use asyncio to manage combining all 3 forms of concurrency instead as well as easily swap between them later if need be.

The short answer


Many developers that are first-timers to concurrency in Python will end up using processing.Process and threading.Thread. However, these are the low-level APIs which have been merged together by the high-level API provided by the concurrent.futures module. Furthermore, spawning processes and threads has overhead, such as requiring more memory, a problem which plagued one of the examples I showed below. To an extent, concurrent.futures manages this for you so that you cannot as easily do something like spawn a thousand processes and crash your computer by only spawning a few processes and then just re-using those processes each time one finishes.

These high-level APIs are provided through concurrent.futures.Executor, which are then implemented by concurrent.futures.ProcessPoolExecutor and concurrent.futures.ThreadPoolExecutor. In most cases, you should use these over the multiprocessing.Process and threading.Thread, because it's easier to change from one to the other in the future when you use concurrent.futures and you don't have to learn the detailed differences of each.

Since these share a unified interfaces, you'll also find that code using multiprocessing or threading will often use concurrent.futures. asyncio is no exception to this, and provides a way to use it via the following code:

import asyncio
from concurrent.futures import Executor
from functools import partial
from typing import Any, Callable, Optional, TypeVar

T = TypeVar("T")

async def run_in_executor(
    executor: Optional[Executor],
    func: Callable[..., T],
    /,
    *args: Any,
    **kwargs: Any,
) -> T:
    """
    Run `func(*args, **kwargs)` asynchronously, using an executor.

    If the executor is None, use the default ThreadPoolExecutor.
    """
    return await asyncio.get_running_loop().run_in_executor(
        executor,
        partial(func, *args, **kwargs),
    )

# Example usage for running `print` in a thread.
async def main():
    await run_in_executor(None, print, "O" * 100_000)

asyncio.run(main())

In fact it turns out that using threading with asyncio was so common that in Python 3.9 they added asyncio.to_thread(func, *args, **kwargs) to shorten it for the default ThreadPoolExecutor.

The long answer


Are there any disadvantages to this approach?

Yes. With asyncio, the biggest disadvantage is that asynchronous functions aren't the same as synchronous functions. This can trip up new users of asyncio a lot and cause a lot of rework to be done if you didn't start programming with asyncio in mind from the beginning.

Another disadvantage is that users of your code will also become forced to use asyncio. All of this necessary rework will often leave first-time asyncio users with a really sour taste in their mouth.

Are there any non-performance advantages to this?

Yes. Similar to how using concurrent.futures is advantageous over threading.Thread and multiprocessing.Process for its unified interface, this approach can be considered a further abstraction from an Executor to an asynchronous function. You can start off using asyncio, and if later you find a part of it you need threading or multiprocessing, you can use asyncio.to_thread or run_in_executor. Likewise, you may later discover that an asynchronous version of what you're trying to run with threading already exists, so you can easily step back from using threading and switch to asyncio instead.

Are there any performance advantages to this?

Yes... and no. Ultimately it depends on the task. In some cases, it may not help (though it likely does not hurt), while in other cases it may help a lot. The rest of this answer provides some explanations as to why using asyncio to run an Executor may be advantageous.

- Combining multiple executors and other asynchronous code

asyncio essentially provides significantly more control over concurrency at the cost of you need to take control of the concurrency more. If you want to simultaneously run some code using a ThreadPoolExecutor along side some other code using a ProcessPoolExecutor, it is not so easy managing this using synchronous code, but it is very easy with asyncio.

import asyncio
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor

async def with_processing():
    with ProcessPoolExecutor() as executor:
        tasks = [...]
        for task in asyncio.as_completed(tasks):
            result = await task
            ...

async def with_threading():
    with ThreadPoolExecutor() as executor:
        tasks = [...]
        for task in asyncio.as_completed(tasks):
            result = await task
            ...

async def main():
    await asyncio.gather(with_processing(), with_threading())

asyncio.run(main())

How does this work? Essentially asyncio asks the executors to run their functions. Then, while an executor is running, asyncio will go run other code. For example, the ProcessPoolExecutor starts a bunch of processes, and then while waiting for those processes to finish, the ThreadPoolExecutor starts a bunch of threads. asyncio will then check in on these executors and collect their results when they are done. Furthermore, if you have other code using asyncio, you can run them while waiting for the processes and threads to finish.

- Narrowing in on what sections of code needs executors

It is not common that you will have many executors in your code, but what is a common problem that I have seen when people use threads/processes is that they will shove the entirety of their code into a thread/process, expecting it to work. For example, I once saw the following code (approximately):

from concurrent.futures import ThreadPoolExecutor
import requests

def get_data(url):
    return requests.get(url).json()["data"]

urls = [...]

with ThreadPoolExecutor() as executor:
    for data in executor.map(get_data, urls):
        print(data)

The funny thing about this piece of code is that it was slower with concurrency than without. Why? Because the resulting json was large, and having many threads consume a huge amount of memory was disastrous. Luckily the solution was simple:

from concurrent.futures import ThreadPoolExecutor
import requests

urls = [...]

with ThreadPoolExecutor() as executor:
    for response in executor.map(requests.get, urls):
        print(response.json()["data"])

Now only one json is unloaded into memory at a time, and everything is fine.

The lesson here?

You shouldn't try to just slap all of your code into threads/processes, you should instead focus in on what part of the code actually needs concurrency.

But what if get_data was not a function as simple as this case? What if we had to apply the executor somewhere deep in the middle of the function? This is where asyncio comes in:

import asyncio
import requests

async def get_data(url):
    # A lot of code.
    ...
    # The specific part that needs threading.
    response = await asyncio.to_thread(requests.get, url, some_other_params)
    # A lot of code.
    ...
    return data

urls = [...]

async def main():
    tasks = [get_data(url) for url in urls]
    for task in asyncio.as_completed(tasks):
        data = await task
        print(data)

asyncio.run(main())

Attempting the same with concurrent.futures is by no means pretty. You could use things such as callbacks, queues, etc., but it would be significantly harder to manage than basic asyncio code.

Simply Beautiful Art
  • 1,284
  • 15
  • 16
  • can you elaborate on the reason why using `requests.get` instead of `get_data` would avoid unloading json objects into memory? they are both functions and in order to return from that, the `requests.get` seems also need to unload the object into memory. – Zac Wrangler May 20 '22 at 16:07
  • 2
    @ZacWrangler There are two significant components to the process here: `requests.get(...)` and `.json()["data"]`. One performs an API request, the other loads the desired data into memory. Applying `threading` to the API request may result in a significant performance improvement because your computer isn't doing any work for it, it's just waiting for stuff to get downloaded. Applying `threading` to the `.json()["data"]` may (and likely will) result in multiple `.json()`'s to start at the same time, and *eventually* followed by `["data"]`, perhaps after ALL of the `.json()`'s are ran. – Simply Beautiful Art May 20 '22 at 17:25
  • 3
    (cont.) In the latter case, this could cause a significant amount of memory to get loaded in at once (size of the `.json()` times the amount of threads), which can be catastrophic for performance. With `asyncio`, you can easily cherry-pick what code gets ran with `threading` and what code doesn't, allowing you to choose not to run `.json()["data"]` with `threading` and instead only load them one at a time. – Simply Beautiful Art May 20 '22 at 17:27
  • Thank you very much. So based on your experience, has there been anything faster or better performed than Asyncio EventLoop to work with Python threading please? – Avv Nov 09 '22 at 21:35
  • 1
    @Avv As far as the event loop itself, it should be largely impossible to even notice any slowdown from the event loop itself. In other words, the event loop is most likely not the issue but rather some other code you've written poorly. The main advantage to using `asyncio` is the ability to cleanly organize your code, giving you more ways to avoid poorly written concurrent code. – Simply Beautiful Art Nov 10 '22 at 22:11
9

Already a lot of good answers. Can't elaborate more on the when to use each one. This is more an interesting combination of two. Multiprocessing + asyncio: https://pypi.org/project/aiomultiprocess/.

The use case for which it was designed was highio, but still utilizing as many of the cores available. Facebook used this library to write some kind of python based File server. Asyncio allowing for IO bound traffic, but multiprocessing allowing multiple event loops and threads on multiple cores.

Ex code from the repo:

import asyncio
from aiohttp import request
from aiomultiprocess import Pool

async def get(url):
    async with request("GET", url) as response:
        return await response.text("utf-8")

async def main():
    urls = ["https://jreese.sh", ...]
    async with Pool() as pool:
        async for result in pool.map(get, urls):
            ...  # process result
            
if __name__ == '__main__':
    # Python 3.7
    asyncio.run(main())
    
    # Python 3.6
    # loop = asyncio.get_event_loop()
    # loop.run_until_complete(main())

Just and addition here, would not working in say jupyter notebook very well, as the notebook already has a asyncio loop running. Just a little note for you to not pull your hair out.

Christo Goosen
  • 566
  • 4
  • 11
  • A whole package isn't super necessary for this, you can see my answer on how to do most of this using normal `asyncio` and `concurrent.futures.ProcessPoolExecutor`. A notable difference is that `aiomultiprocessing` works on coroutines, which means it likely spawns many event loops instead of using one unified event loop (as seen from the source code), for better or worse. – Simply Beautiful Art Jan 31 '22 at 05:40
  • Of course its not necessary for a library. But the point of the library is multiple event loops. This was built at Facebook in a situation where they wanted to use every available CPU for a python based object/file store. Think django spawning multiple subprocesses with uwsgi and each has mutliple threads. – Christo Goosen Jan 31 '22 at 10:09
  • Also the library removes some boilerplate code, simplifies it for the developer. – Christo Goosen Jan 31 '22 at 10:09
  • 1
    Thanks for explaining the difference, I think I now have a better understanding of its purpose. Rather than really be for computationally expensive tasks, as you might normally think for `multiprocessing`, where it actually shines is in running multiple event loops. That is to say, this is the option to go to if you find the event loop for `asyncio` itself to have become the bottleneck, such as due to a shear number of clients on a server. – Simply Beautiful Art Jan 31 '22 at 13:44
  • Pleasure. Yeah I happened to watch a youtube video where the author described its use. Was very insightful as it explained the purpose well. Definitely not a magic bullet and probably not the use case for everyone. Would perhaps be at the core of web server or low level network application. Basically just churn through as many requests as CPUs and the multiple event loops can handle. https://www.youtube.com/watch?v=0kXaLh8Fz3k – Christo Goosen Jan 31 '22 at 20:30
7

I’m not a professional Python user, but as a student in computer architecture I think I can share some of my considerations when choosing between multi processing and multi threading. Besides, some of the other answers (even among those with higher votes) are misusing technical terminology, so I thinks it’s also necessary to make some clarification on those as well, and I’ll do it first.

The fundamental difference between multiprocessing and multithreading is whether they share the same memory space. Threads share access to the same virtual memory space, so it is efficient and easy for threads to exchange their computation results (zero copy, and totally user-space execution).

Processes on the other hand have separate virtual memory spaces. They cannot directly read or write the other process’ memory space, just like a person cannot read or alter the mind of another person without talking to him. (Allowing so would be a violation of memory protection and defeat the purpose of using virtual memory. ) To exchange data between processes, they have to rely on the operating system’s facility (e.g. message passing), and for more than one reasons this is more costly to do than the “shared memory” scheme used by threads. One reason is that invoking the OS’ message passing mechanism requires making a system call which will switch the code execution from user mode to kernel mode, which is time consuming; another reason is likely that OS message passing scheme will have to copy the data bytes from the senders’ memory space to the receivers’ memory space, so non-zero copy cost.

It is incorrect to say a multithread program can only use one CPU. The reason why many people say so is due to an artifact of the CPython implementation: global interpreter lock (GIL). Because of the GIL, threads in a CPython process are serialized. As a result, it appears that the multithreaded python program only uses one CPU.

But multi thread computer programs in general are not restricted to one core, and for Python, implementations that do not use the GIL can indeed run many threads in parallel, that is, run on more than one CPU at the same time. (See https://wiki.python.org/moin/GlobalInterpreterLock).

Given that CPython is the predominant implementation of Python, it’s understandable why multithreaded python programs are commonly equated to being bound to a single core.

With Python with GIL, the only way to unleash the power of multicores is to use multiprocessing (there are exceptions to this as mentioned below). But your problem better be easily partition-able into parallel sub-problems that have minimal intercommunication, otherwise a lot of inter-process communication will have to take place and as explained above, the overhead of using the OS’ message passing mechanism will be costly, sometimes so costly the benefits of parallel processing are totally offset. If the nature of your problem requires intense communication between concurrent routines, multithreading is the natural way to go. Unfortunately with CPython, true, effectively parallel multithreading is not possible due to the GIL. In this case you should realize Python is not the optimal tool for your project and consider using another language.

There’s one alternative solution, that is to implement the concurrent processing routines in an external library written in C (or other languages), and import that module to Python. The CPython GIL will not bother to block the threads spawned by that external library.

So, with the burdens of GIL, is multithreading in CPython any good? It still offers benefits though, as other answers have mentioned, if you’re doing IO or network communication. In these cases the relevant computation is not done by your CPU but done by other devices (in the case of IO, the disk controller and DMA (direct memory access) controller will transfer the data with minimal CPU participation; in the case of networking, the NIC (network interface card) and DMA will take care of much of the task without CPU’s participation), so once a thread delegates such task to the NIC or disk controller, the OS can put that thread to a sleeping state and switch to other threads of the same program to do useful work.

In my understanding, the asyncio module is essentially a specific case of multithreading for IO operations.

So: CPU-intensive programs, that can easily be partitioned to run on multiple processes with limited communication: Use multithreading if GIL does not exist (eg Jython), or use multiprocess if GIL is present (eg CPython).

CPU-intensive programs, that requires intensive communication between concurrent routines: Use multithreading if GIL does not exist, or use another programming language.

Lot’s of IO: asyncio

fjs
  • 330
  • 2
  • 9
6
  • Multiprocessing can be run parallelly.

  • Multithreading and asyncio cannot be run parallelly.

With Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz and 32.0 GB RAM, I timed how many prime numbers are between 2 and 100000 with 2 processes, 2 threads and 2 asyncio tasks as shown below. *This is CPU bound calculation:

Multiprocessing Multithreading asyncio
23.87 seconds 45.24 seconds 44.77 seconds

Because multiprocessing can be run parallelly so multiprocessing is double more faster than multithreading and asyncio as shown above.

I used 3 sets of code below:

Multiprocessing:

# "process_test.py"

from multiprocessing import Process
import time
start_time = time.time()

def test():
    num = 100000
    primes = 0
    for i in range(2, num + 1):
        for j in range(2, i):
            if i % j == 0:
                break
        else:
            primes += 1
    print(primes)

if __name__ == "__main__": # This is needed to run processes on Windows
    process_list = []

    for _ in range(0, 2): # 2 processes
        process = Process(target=test)
        process_list.append(process)

    for process in process_list:
        process.start()

    for process in process_list:
        process.join()

    print(round((time.time() - start_time), 2), "seconds") # 23.87 seconds

Result:

...
9592
9592
23.87 seconds

Multithreading:

# "thread_test.py"

from threading import Thread
import time
start_time = time.time()

def test():
    num = 100000
    primes = 0
    for i in range(2, num + 1):
        for j in range(2, i):
            if i % j == 0:
                break
        else:
            primes += 1
    print(primes)

thread_list = []

for _ in range(0, 2): # 2 threads
    thread = Thread(target=test)
    thread_list.append(thread)
    
for thread in thread_list:
    thread.start()

for thread in thread_list:
    thread.join()

print(round((time.time() - start_time), 2), "seconds") # 45.24 seconds

Result:

...
9592
9592
45.24 seconds

Asyncio:

# "asyncio_test.py"

import asyncio
import time
start_time = time.time()

async def test():
    num = 100000
    primes = 0
    for i in range(2, num + 1):
        for j in range(2, i):
            if i % j == 0:
                break
        else:
            primes += 1
    print(primes)

async def call_tests():
    tasks = []

    for _ in range(0, 2): # 2 asyncio tasks
        tasks.append(test())

    await asyncio.gather(*tasks)

asyncio.run(call_tests())

print(round((time.time() - start_time), 2), "seconds") # 44.77 seconds

Result:

...
9592
9592
44.77 seconds
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Super Kai - Kazuya Ito
  • 22,221
  • 10
  • 124
  • 129
1

Multiprocessing Each process has its own Python interpreter and can run on a separate core of a processor. Python multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers true parallelism, effectively side-stepping the Global Interpreter Lock by using sub processes instead of threads.

Use multiprocessing when you have CPU intensive tasks.

Multithreading Python multithreading allows you to spawn multiple threads within the process. These threads can share the same memory and resources of the process. In CPython due to Global interpreter lock at any given time only a single thread can run, hence you cannot utilize multiple cores. Multithreading in Python does not offer true parallelism due to GIL limitation.

Asyncio Asyncio works on co-operative multitasking concepts. Asyncio tasks run on the same thread so there is no parallelism, but it provides better control to the developer instead of the OS which is the case in multithreading.

There is a nice discussion on this link regarding the advantages of asyncio over threads.

There is a nice blog by Lei Mao on Python concurrency here

Multiprocessing VS Threading VS AsyncIO in Python Summary

0

Just another perspective

There is a difference in the nature of concurrency in multithreading vs asyncio. Threads can be interleaved at any point of execution. OS controls when one thread is kicked out and the other is given a chance (allocated CPU). There is no consistency and predictability on when threads will be interleaved. That'S why you can have race-conditions in multi threading. However, asyncio is synchronous as long as you are not awaiting on something. Event loop will keep executing until there is an await You can clearly see where coroutines are interleaved. Event loop will kick out a coroutine when the coroutine is awaiting. In that sense multithreading is a "true" concurrent model. As I said asyncio is not concurrent until you are not awaiting. I am not saying asyncio is better or worse.

# Python 3.9.6
import asyncio
import time


async def test(name: str):
    print(f"sleeping: {name}")
    time.sleep(3) # imagine that this is big chunk of code/ or a number     crunching block that takes a while to execute
    print(f"awaiting sleep: {name}")

    await asyncio.sleep(2)
    print(f"woke up: {name}")


async def main():
    print("In main")
    tasks = [test(name="1"), test(name="2"), test(name="3")]
    await asyncio.gather(*tasks)


if __name__ == "__main__":
    asyncio.run(main())

Output:

In main
sleeping: 1
awaiting sleep: 1
sleeping: 2
awaiting sleep: 2
sleeping: 3
awaiting sleep: 3
woke up: 1
woke up: 2
woke up: 3

You can see that the order is predictable and it is always same and synchronous. No interleaving. Whereas with multithreading you cannot predict the order (always different).

sajid
  • 807
  • 1
  • 9
  • 23