How to implement "lazy" multiprocessing?

Question

Let's say I have some task.py that takes 5 minutes to run.

I need to run task.py with 1,000 different inputs. Each run of task.py is completely independent and low memory. I don't care that they all finish at the same time, just that they finish.

I know I could use multi-processing or multithreading, but is there any good reason to not do the folloiwng:

import subprocess
import sys
import numpy as np

for arg in np.arange(0, 1.01, .1):  
    print(contrib)
    pid = subprocess.Popen(
        [sys.executable, "C:/task.py", "--arg", str(arg)])

if the workload is completely independent and the tasks don't return any input then I think it's down to preference — Joshua Nixon, Jun 12 '19 at 21:29
What if I move (or just change name) the `task.py` script? (not to mention Windows dependency) At the moment you won't even know that it doesn't work. What if I run your code on system that has `sys.executable` as `None`? — freakish, Jun 12 '19 at 21:30
@freakish - I appreciate the input, but the question is more about thread-leakage and potentially issues that are lower-level. — jason m, Jun 12 '19 at 21:41
Using `subprocess` (or `multiprocessing`) is going to launch that many separate processes each running their own instance of the Python interpreter to execute `task.py` all at once — which is a pretty heavy load to put on the OS and may actually slow things down overall. Multithreading wouldn't do that, so might be preferable, but will still use up a lot of system resources. I suggest you use a `concurrent.futures.Executor` which can do either (multithreading or multiprocessing), but more importantly, makes it easy to limit the number of them than can run concurrently. — martineau, Jun 12 '19 at 21:58
@martineau Yes I think this the most reasonable answer. Without doing more testing, it is obviously a function of memory and cores. — jason m, Jun 13 '19 at 23:15
jason: It's also easy to configure the maximum number of "workers" (aka `max_workers`) when using a `concurrent.futures.Executor` subclass like `ThreadPoolExecutor` or `ProcessPoolExecutor` — making it relatively easy to limit how many resources they'll consume cpu and memory-wise at runtime based on what's available on the system they're being run on. — martineau, Jun 13 '19 at 23:38

How to implement "lazy" multiprocessing?

0 Answers0