0

I have a shell script that uses GNU Parallel to run function in parallel. Now, I am rewriting the script to Python and I dont know, how to do this correctly.

In script, I have:

parallel --jobs 5 --linebuffer run {1} ::: "${files[@]}"

How can I convert this to Python code? In shell, files is an array of files, run method calls external program that process the file.

In Python, I have method def run(file), that have several Python command to prepare data and at the end, it calls external program with os.command.

def run(file):
  do something with input file
  os.command(...)
Martin Perry
  • 9,232
  • 8
  • 46
  • 114
  • @mkrieger1 not exactly. I dont want to use `Popen`. I have `run` method in python, and inside this method is `os.command`. Threading seems more reasonable, but I dont know whether to use it or multiprocessing. Also, I dont know, hot to pass the array of files. – Martin Perry Oct 09 '21 at 11:07
  • Why do you not want to use `Popen`? – mkrieger1 Oct 09 '21 at 11:17
  • How can I call Python method with Popen? – Martin Perry Oct 09 '21 at 11:20
  • Note that processes will be created anyway using the `os.command` call. Creating processes for calling internal Python methods using Popen is not great though because it is often too low level. Using [processing pools (eg with map)](https://docs.python.org/3/library/multiprocessing.html) is often simpler and better. If you know that your processing is IO bound and written in Python, then threads are better (due to the GIL). – Jérôme Richard Oct 09 '21 at 11:55

1 Answers1

2

I would use multiprocessing :

from multiprocessing import Pool

def run(file):
  do something with input file
  os.command(...)

if __name__ == '__main__':
  with Pool(5) as p:
    p.map(run, sys.argv[1:])

Call it with :

python test.py "${files[@]}"
Philippe
  • 20,025
  • 2
  • 23
  • 32