0

Let's say I have some task.py that takes 5 minutes to run.

I need to run task.py with 1,000 different inputs. Each run of task.py is completely independent and low memory. I don't care that they all finish at the same time, just that they finish.

I know I could use multi-processing or multithreading, but is there any good reason to not do the folloiwng:

import subprocess
import sys
import numpy as np

for arg in np.arange(0, 1.01, .1):  
    print(contrib)
    pid = subprocess.Popen(
        [sys.executable, "C:/task.py", "--arg", str(arg)])
martineau
  • 119,623
  • 25
  • 170
  • 301
jason m
  • 6,519
  • 20
  • 69
  • 122
  • if the workload is completely independent and the tasks don't return any input then I think it's down to preference – Joshua Nixon Jun 12 '19 at 21:29
  • What if I move (or just change name) the `task.py` script? (not to mention Windows dependency) At the moment you won't even know that it doesn't work. What if I run your code on system that has `sys.executable` as `None`? – freakish Jun 12 '19 at 21:30
  • @freakish - I appreciate the input, but the question is more about thread-leakage and potentially issues that are lower-level. – jason m Jun 12 '19 at 21:41
  • I think we could answer better if we knew what the task is. – Joshua Nixon Jun 12 '19 at 21:44
  • Using `subprocess` (or `multiprocessing`) is going to launch that many separate processes each running their own instance of the Python interpreter to execute `task.py` all at once — which is a pretty heavy load to put on the OS and may actually slow things down overall. Multithreading wouldn't do that, so might be preferable, but will still use up a lot of system resources. I suggest you use a `concurrent.futures.Executor` which can do either (multithreading or multiprocessing), but more importantly, makes it easy to limit the number of them than can run concurrently. – martineau Jun 12 '19 at 21:58
  • @martineau Yes I think this the most reasonable answer. Without doing more testing, it is obviously a function of memory and cores. – jason m Jun 13 '19 at 23:15
  • jason: It's also easy to configure the maximum number of "workers" (aka `max_workers`) when using a `concurrent.futures.Executor` subclass like `ThreadPoolExecutor` or `ProcessPoolExecutor` — making it relatively easy to limit how many resources they'll consume cpu and memory-wise at runtime based on what's available on the system they're being run on. – martineau Jun 13 '19 at 23:38

0 Answers0