1
#!/bin/bash

      data_dir=./all
      for file_name in "$data_dir"/*
      do
        echo "$file_name"
        python process.py "$file_name"
      done
   

For example, this script processes the files sequentially in a directory in a 'for' loop. Is it possible to start multiple process.py instances to process files concurrently? I want to do this in a shell script.

marlon
  • 6,029
  • 8
  • 42
  • 76

3 Answers3

1

It's better to use os.listdir and subprocess.Popen to start new processes.

Yehor Smoliakov
  • 326
  • 3
  • 13
  • So, if I use the subprocess.open() to start a new process for each file in a list, it one process won't wait for the other to complete? I want multi python instances to process at the same time. – marlon May 11 '22 at 20:47
  • Could you write an answer based on the link? – marlon May 11 '22 at 20:49
  • 1
    Yes, all processes will work independently. You can use `communicate()` method to block the next processes until a running one will not be finished. – Yehor Smoliakov May 11 '22 at 20:49
  • So I shouldn't use 'communicate()' for concurrency? – marlon May 11 '22 at 20:52
  • subprocess.Popen(["python process.py", "all"]), but this gives an error. 'all' is my data directory. – marlon May 11 '22 at 20:55
0

I have another possibility for you, if still needed. It uses the screen command to create a new detached process with the supplied command.

Here is an example:

#!/bin/bash

data_dir=./all
for file_name in "$data_dir"/*
do
  echo "$file_name"
  screen -dm python process.py "$file_name"
done
cfgn
  • 201
  • 2
  • 10
0

With GNU Parallel, like this:

parallel python process.py {} ::: all/*

It will run N jobs in parallel, where N is the number of CPU cores you have, or you can specify -j4 to run on just 4, for example.

Many, many options for:

  • logging,
  • splitting/chunking inputs,
  • tagging/separating output,
  • staggering job starts,
  • massaging input parameters,
  • fail and retry handling,
  • distributing jobs and data to other machines
  • and so on...

Try putting [gnu-parallel] in the StackOverflow search box.

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432