0

I know that there is a GIL in python that forces threads to execute on only 1 core. But I created processes by the number of processor cores, and in each process I create threads. In theory, will they be executed in parallel in each process? And if it works, how can I synchronize everything while using Pool

from multiprocessing import Pool
from concurrent.futures import ThreadPoolExecutor

def make_threads(data):
 with ThreadPoolExecutor(len(data)) as executor:
     answer=list(executor.map(some_function,data))
     return answer

def main():
with Pool(processes_count) as p:
         answer=list(p.map(make_threads,data))```
3NiGMa
  • 545
  • 1
  • 9
  • 24
  • 1
    The threads in each worker process will still be GIL-bound, so if the work is all CPU bound, skip the threads, and just use the process pool. – ShadowRanger Mar 23 '20 at 21:41
  • I ran into a problem that I cannot use more than 62 processes, and there are no such restrictions for threads. Threads in each workflow will still be bound to the GIL, but threads in different processes will work in parallel? If so, then this is better than just a thread pool. – Just Relax Mar 23 '20 at 23:02
  • Unless the threads are doing IO-bound stuff, you're only going to have one thread per process doing anything. So there's seldom a good reason to mix the two. Just spawn one process per CPU core and send all your work to them (the pool types are designed to split things up for you). – Blckknght Mar 24 '20 at 00:07
  • If you are working with a tool like numpy/pandas/scikit that releases the GIL for its calculations then threads will run in parallel. – tdelaney Mar 24 '20 at 00:10
  • How many processors does your system have? If you are cpu intensive, more processes than cores tend to slow things down with more scheduling and more memory usage. Limit your process pool to the number of cores (or even a few less because other processes run) and let `map` do the scheduling. – tdelaney Mar 24 '20 at 00:12

1 Answers1

0

Use concurrent.futures instead of multiprocessing or multithreading if you can - it's a better API that allows you to conveniently switch from threads to processes, or vice-versa.

Within a given python process, threads can do I/O bound tasks well, but CPU-bound tasks poorly. And if you have 20 I/O bound threads and 1 CPU-bound thread in a single process, then they all will have performance problems - the CPU-bound thread messes up the I/O bound threads.

Often Queues are the best way to communicate across processes and threads.

dstromberg
  • 6,954
  • 1
  • 26
  • 27
  • I used concurrent.futures for ProcessPoolExecutor, the script executed, but an error occurs at the end : ```OSError: handle is closed```. I did not find an answer in the forums, and decided to use multiprocessing.Threads work with selenium. Each thread opens a webdriver, sends a get request, and closes the webdriver. It's all. I do not know which is better in terms of performance. – Just Relax Mar 23 '20 at 23:09