1

Regarding this code example of Python ayncio run_in_executor:

import asyncio
import concurrent.futures

def blocking_io():
    # File operations (such as logging) can block the
    # event loop: run them in a thread pool.
    with open('/dev/urandom', 'rb') as f:
        return f.read(100)

def cpu_bound():
    # CPU-bound operations will block the event loop:
    # in general it is preferable to run them in a
    # process pool.
    return sum(i * i for i in range(10 ** 7))

async def main():
    loop = asyncio.get_running_loop()

    ## Options:

    # 1. Run in the default loop's executor:
    result = await loop.run_in_executor(
        None, blocking_io)
    print('default thread pool', result)

    # 3. Run in a custom process pool:
    with concurrent.futures.ProcessPoolExecutor() as pool:
        result = await loop.run_in_executor(
            pool, cpu_bound)
        print('custom process pool', result)

asyncio.run(main())

The example (in comments) recommends to run the i/o bound function using the ThreadPoolExecutor, and the cpu bound function using the ProcessPoolExecutor. I want to verify my understanding of the reasons behind this with three questions:

  1. These recommendations are not really recommendations, because otherwise the event loop will block. Consequently, we will lose the main benefit of event programming, correct?

  2. Running the io/ bound task as separate thread, require the following assumption: The i/o call will release the GIL, correct? Because other than that the os will not be able to context switch between the event loop and this new separate thread.

  3. If the answer to point 2 is yes, then how to know for sure if an i/o call releases the GIL or not?

adnanmuttaleb
  • 3,388
  • 1
  • 29
  • 46

1 Answers1

6

These recommendations are not really recommendations, because otherwise the event loop will block. Consequently, we will lose the main benefit of event programming, correct?

The event loop will block if you call blocking (both I/O and CPU blocking) function in a coroutine without awaiting for an executor. In this regard, yes, you shouldn't allow this to happen.

The recommendation I'd say it a type of executor for each type of blocking code: use ProcessPoolExecutor for CPU-bound stuff, use ThreadPoolExecutor for I/O bound stuff.

Running the io/ bound task as separate thread, require the following assumption: The i/o call will release the GIL, correct? Because other than that the os will not be able to context switch between the event loop and this new separate thread.

When it comes to multithreading, Python will be switching between threads after a very short amount of time without releasing a GIL. But if one or more threads have I/O (or C-code), then the GIL will be released, allowing the interpreter to spend more time with the thread requiring it.

The bottom line is:

  • You can run any blocking code in executor, it won't block event loop. You get concurrency, but may or may not gain performance.
  • For example, if you run CPU-bound code in ThreadPoolExecutor, you won't get a performance benefit from concurrency due to GIL. To gain the performance for CPU-bound stuff, you should use ProcessPoolExecutor.
  • But I/O-bound can be run in ThreadPoolExecutor and you gain performance. There's no need to use heavier ProcessPoolExecutor here.

I wrote an example to demonstrate how it works:

import sys
import asyncio
import time
import concurrent.futures
import requests
from contextlib import contextmanager

process_pool = concurrent.futures.ProcessPoolExecutor(2)
thread_pool = concurrent.futures.ThreadPoolExecutor(2)


def io_bound():
    for i in range(3):
        requests.get("https://httpbin.org/delay/0.4")  # I/O blocking
        print(f"I/O bound {i}")
        sys.stdout.flush()


def cpu_bound():
    for i in range(3):
        sum(i * i for i in range(10 ** 7))  # CPU blocking
        print(f"CPU bound {i}")
        sys.stdout.flush()


async def run_as_is(func):
    func()


async def run_in_process(func):
    loop = asyncio.get_event_loop()
    await loop.run_in_executor(process_pool, func)


async def run_in_thread(func):
    loop = asyncio.get_event_loop()
    await loop.run_in_executor(thread_pool, func)


@contextmanager
def print_time():
    start = time.time()
    yield
    finished = time.time() - start
    print(f"Finished in {round(finished, 1)}\n")


async def main():
    print("Wrong due to blocking code in coroutine,")
    print(
        "you get neither performance, nor concurrency (which breaks async nature of the code)"
    )
    print("don't allow this to happen")
    with print_time():
        await asyncio.gather(run_as_is(cpu_bound), run_as_is(io_bound))

    print("CPU bound works concurrently with threads,")
    print("but you gain no performance due to GIL")
    with print_time():
        await asyncio.gather(run_in_thread(cpu_bound), run_in_thread(cpu_bound))

    print("To get perfromance for CPU-bound,")
    print("use process executor")
    with print_time():
        await asyncio.gather(run_in_process(cpu_bound), run_in_process(cpu_bound))

    print("I/O bound will gain benefit from processes as well...")
    with print_time():
        await asyncio.gather(run_in_process(io_bound), run_in_process(io_bound))

    print(
        "... but there's no need in processes since you can use lighter threads for I/O"
    )
    with print_time():
        await asyncio.gather(run_in_thread(io_bound), run_in_thread(io_bound))

    print("Long story short,")
    print("Use processes for CPU bound due to GIL")
    print(
        "and use threads for I/O bound since you benefit from concurrency regardless of GIL"
    )
    with print_time():
        await asyncio.gather(run_in_thread(io_bound), run_in_process(cpu_bound))


if __name__ == "__main__":
    asyncio.run(main())

Output:

Wrong due to blocking code in coroutine,
you get neither performance, nor concurrency (which breaks async nature of the code)
don't allow this to happen
CPU bound 0
CPU bound 1
CPU bound 2
I/O bound 0
I/O bound 1
I/O bound 2
Finished in 5.3

CPU bound works concurrently with threads,
but you gain no performance due to GIL
CPU bound 0
CPU bound 0
CPU bound 1
CPU bound 1
CPU bound 2
CPU bound 2
Finished in 4.6

To get perfromance for CPU-bound,
use process executor
CPU bound 0
CPU bound 0
CPU bound 1
CPU bound 1
CPU bound 2
CPU bound 2
Finished in 2.5

I/O bound will gain benefit from processes as well...
I/O bound 0
I/O bound 0
I/O bound 1
I/O bound 1
I/O bound 2
I/O bound 2
Finished in 3.3

... but there's no need in processes since you can use lighter threads for I/O
I/O bound 0
I/O bound 0
I/O bound 1
I/O bound 1
I/O bound 2
I/O bound 2
Finished in 3.1

Long story short,
Use processes for CPU bound due to GIL
and use threads for I/O bound since you benefit from concurrency regardless of GIL
CPU bound 0
I/O bound 0
CPU bound 1
I/O bound 1
CPU bound 2
I/O bound 2
Finished in 2.9
Mikhail Gerasimov
  • 36,989
  • 16
  • 116
  • 159
  • Thank you for your great answer, just to be sure I understood correctly: multithreading in case of i/o will always result in performance gain, but in case the GIL is released there will be an additional gain, due to the more smart scheduling (in contrast to time-sharing), correct? – adnanmuttaleb Dec 23 '21 at 10:50
  • And one more clarification, in case of a third-party i/o library, we can be sure that the GIL will be released if they utilize pythod standared i/o, otherwise, we need to check if there C implementation releases the GIL, correct? (assuming that not all C code releases the GIL). – adnanmuttaleb Dec 23 '21 at 10:55
  • 1
    "in case of i/o will always result in performance gain" - yes. "GIL is released there will be an additional gain" – I think it should be the case in comparison to the situation it's not released, but I'm not sure the last situation can happen at all (in case of Python code): most of the time during I/O we just wait for some I/O syscall and I see no reason why Python shouldn't release GIL meanwhile. – Mikhail Gerasimov Dec 23 '21 at 13:57
  • 1
    "we need to check if there C implementation releases the GIL, correct?" - I think so. But on the other hand, I think if you run it in a thread, you already gained most of the performance boost due to just making the I/O concurrent. I don't think GIL releasing/non-releasing is a big deal in comparison. – Mikhail Gerasimov Dec 23 '21 at 13:57