How does ThreadPoolExecutor utilise 32 CPU cores for CPU bound tasks

Question

Changed in version 3.8: Default value of max_workers is changed to min(32, os.cpu_count() + 4). This default value preserves at least 5 workers for I/O bound tasks. It utilizes at most 32 CPU cores for CPU bound tasks which release the GIL. And it avoids using very large resources implicitly on many-core machines.

According to my understanding of GIL, thread based concurrency is only possible for I/O bound tasks. For CPU bound tasks, thread based concurrency is NOT possible, meaning for CPU bound tasks, GIL forces only single threaded execution. My understanding appears to contradict the bolded line in the ThreadPoolExecutor. What am I misunderstanding here?

Furthermore, what does

which release the GIL

mean? Don't CPU bound tasks keep hold of the GIL (unless it is preempted)?

From this answer, I suspect this has something to do with

spending most of its time in an external library designed to release the GIL (like NumPy)

Does that mean thread based concurrency for CPU bound tasks is actually possible provided that threads are doing the CPU bound tasks within a some specially designed external library "designed to release the GIL"?

Libraries written in C can release the GIL if they don't need it for whatever they're doing. — user2357112, Sep 08 '21 at 08:18

score 3 · Accepted Answer · answered Sep 08 '21 at 08:20

Yes, exactly. Since the GIL protects python interpreter state, a library can release the lock if it has a significant amount of work to do that doesn't involve accessing Python variables or calling Python functions. NumPy is one such library that can frequently do this.

How does ThreadPoolExecutor utilise 32 CPU cores for CPU bound tasks

1 Answers1

Linked