Python: running many subprocesses from different threads is slow

Question

I have a program with 1 process that starts a lot of threads. Each thread might use subprocess.Popen to run some command. I see that the time to run the command increases with the number of threads. Example:

>>> def foo():
...     s = time.time()
...     subprocess.Popen('ip link show'.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()
...     print(time.time() - s)
...
>>> foo()
0.028950929641723633
>>> [threading.Thread(target=foo).start() for _ in range(10)]
0.058995723724365234
0.07323050498962402
0.09158825874328613
0.11541390419006348 # !!!
0.08147192001342773
0.05238771438598633
0.0950784683227539
0.10175108909606934 # !!!
0.09703755378723145
0.06497764587402344

Is there another way of executing a lot of commands from single process in parallel that doesn't decrease the performance?

Czaporka · Answer 1 · 2021-05-09T07:05:28.490

Python's threads are, of course, concurrent, but they do not really run in parallel because of the GIL. Therefore, they are not suitable for CPU-bound applications. If you need to truly parallelize something and allow it to run on all CPU cores, you will need to use multiple processes. Here is a nice answer discussing this in more detail: What are the differences between the threading and multiprocessing modules?.

For the above example, multiprocessing.pool may be a good choice (note that there is also a ThreadPool available in this module).

from multiprocessing.pool import Pool
import subprocess
import time

def foo(*args):
    s = time.time()
    subprocess.Popen('ip link show'.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True).communicate()
    return time.time() - s

if __name__ == "__main__":
    with Pool(10) as p:

        result = p.map(foo, range(10))
        print(result)
        # [0.018695592880249023, 0.009021520614624023, 0.01150059700012207, 0.02113938331604004, 0.014114856719970703, 0.01342153549194336, 0.011168956756591797, 0.014746427536010742, 0.013572454452514648, 0.008752584457397461]

        result = p.map_async(foo, range(10))
        print(result.get())
        # [0.00636744499206543, 0.011589527130126953, 0.010645389556884766, 0.0070612430572509766, 0.013571739196777344, 0.009610414505004883, 0.007040739059448242, 0.010993719100952148, 0.012415409088134766, 0.0070383548736572266]

However, if your function is similar to the example in that it mostly just launches other processes and doesn't do a lot of calculations - I doubt parallelizing it will make much of a difference because the subprocesses can already run in parallel. Perhaps the slowdown occurs because your whole system gets overwhelmed for a moment because of all those processes (could be CPU usage is high or too many disk reads/writes are attempted within a short time). I would suggest taking a close look at system resources (Task Manager etc.) while running the program.

score 0 · Answer 2 · answered May 16 '21 at 04:37

maybe it has nothing to do with python: Opening a new shell = opening a new file since basically everything is a file on linux

take a look at your limit for open files with this command (default is 1024):

ulimit

and try to raise it with this command to see if your code gets faster :

ulimit -n 2048

Python: running many subprocesses from different threads is slow

2 Answers2