These recommendations are not really recommendations, because otherwise the event loop will block. Consequently, we will lose the main benefit of event programming, correct?
The event loop will block if you call blocking (both I/O and CPU blocking) function in a coroutine without awaiting for an executor. In this regard, yes, you shouldn't allow this to happen.
The recommendation I'd say it a type of executor for each type of blocking code: use ProcessPoolExecutor for CPU-bound stuff, use ThreadPoolExecutor for I/O bound stuff.
Running the io/ bound task as separate thread, require the following assumption: The i/o call will release the GIL, correct? Because other than that the os will not be able to context switch between the event loop and this new separate thread.
When it comes to multithreading, Python will be switching between threads after a very short amount of time without releasing a GIL. But if one or more threads have I/O (or C-code), then the GIL will be released, allowing the interpreter to spend more time with the thread requiring it.
The bottom line is:
- You can run any blocking code in executor, it won't block event loop. You get concurrency, but may or may not gain performance.
- For example, if you run CPU-bound code in ThreadPoolExecutor, you won't get a performance benefit from concurrency due to GIL. To gain the performance for CPU-bound stuff, you should use ProcessPoolExecutor.
- But I/O-bound can be run in ThreadPoolExecutor and you gain performance. There's no need to use heavier ProcessPoolExecutor here.
I wrote an example to demonstrate how it works:
import sys
import asyncio
import time
import concurrent.futures
import requests
from contextlib import contextmanager
process_pool = concurrent.futures.ProcessPoolExecutor(2)
thread_pool = concurrent.futures.ThreadPoolExecutor(2)
def io_bound():
for i in range(3):
requests.get("https://httpbin.org/delay/0.4") # I/O blocking
print(f"I/O bound {i}")
sys.stdout.flush()
def cpu_bound():
for i in range(3):
sum(i * i for i in range(10 ** 7)) # CPU blocking
print(f"CPU bound {i}")
sys.stdout.flush()
async def run_as_is(func):
func()
async def run_in_process(func):
loop = asyncio.get_event_loop()
await loop.run_in_executor(process_pool, func)
async def run_in_thread(func):
loop = asyncio.get_event_loop()
await loop.run_in_executor(thread_pool, func)
@contextmanager
def print_time():
start = time.time()
yield
finished = time.time() - start
print(f"Finished in {round(finished, 1)}\n")
async def main():
print("Wrong due to blocking code in coroutine,")
print(
"you get neither performance, nor concurrency (which breaks async nature of the code)"
)
print("don't allow this to happen")
with print_time():
await asyncio.gather(run_as_is(cpu_bound), run_as_is(io_bound))
print("CPU bound works concurrently with threads,")
print("but you gain no performance due to GIL")
with print_time():
await asyncio.gather(run_in_thread(cpu_bound), run_in_thread(cpu_bound))
print("To get perfromance for CPU-bound,")
print("use process executor")
with print_time():
await asyncio.gather(run_in_process(cpu_bound), run_in_process(cpu_bound))
print("I/O bound will gain benefit from processes as well...")
with print_time():
await asyncio.gather(run_in_process(io_bound), run_in_process(io_bound))
print(
"... but there's no need in processes since you can use lighter threads for I/O"
)
with print_time():
await asyncio.gather(run_in_thread(io_bound), run_in_thread(io_bound))
print("Long story short,")
print("Use processes for CPU bound due to GIL")
print(
"and use threads for I/O bound since you benefit from concurrency regardless of GIL"
)
with print_time():
await asyncio.gather(run_in_thread(io_bound), run_in_process(cpu_bound))
if __name__ == "__main__":
asyncio.run(main())
Output:
Wrong due to blocking code in coroutine,
you get neither performance, nor concurrency (which breaks async nature of the code)
don't allow this to happen
CPU bound 0
CPU bound 1
CPU bound 2
I/O bound 0
I/O bound 1
I/O bound 2
Finished in 5.3
CPU bound works concurrently with threads,
but you gain no performance due to GIL
CPU bound 0
CPU bound 0
CPU bound 1
CPU bound 1
CPU bound 2
CPU bound 2
Finished in 4.6
To get perfromance for CPU-bound,
use process executor
CPU bound 0
CPU bound 0
CPU bound 1
CPU bound 1
CPU bound 2
CPU bound 2
Finished in 2.5
I/O bound will gain benefit from processes as well...
I/O bound 0
I/O bound 0
I/O bound 1
I/O bound 1
I/O bound 2
I/O bound 2
Finished in 3.3
... but there's no need in processes since you can use lighter threads for I/O
I/O bound 0
I/O bound 0
I/O bound 1
I/O bound 1
I/O bound 2
I/O bound 2
Finished in 3.1
Long story short,
Use processes for CPU bound due to GIL
and use threads for I/O bound since you benefit from concurrency regardless of GIL
CPU bound 0
I/O bound 0
CPU bound 1
I/O bound 1
CPU bound 2
I/O bound 2
Finished in 2.9