0

I have a cluster of 40 nproc where my software uses ten cores at a time. I want to run 4 task at a time from single bash script (job script), and whenever one completes it, start another one. The loop needs to run as per the input files (approx 600). Each command executes in its specific folder for n times.

currently, I am using the following script

#! /bin/bash

for f in lib1/*.png; do
for g in lib2/*.png; do
for h in lib3/*.png; do
for i in lib4/*.png; do
        ./MySoftyware --input $f --out $f_out.png &
P1=$!
        ./MySoftyware --input $g --out $g_out.png &
P2=$!
        ./MySoftyware --input $h --out $h_out.png &
P3=$!
        ./MySoftyware --input $i --out $i_out.png &
P4=$!
wait $P1 $P2 $P3 $P4
done
done
done
done

The issue is that process only starts again when all four are completed, but I want it to execute on its next available file as soon as one is completed from the four commands. e.g., if $f is completed before $i, it should execute the next file in $f and not wait for the completion of $i

Ravi Saini
  • 11
  • 3
  • No, this one runs a single command with max CPUs. I am restricted to 4 parallel commands independent to each other – Ravi Saini Feb 27 '22 at 23:23
  • define a function that accepts as input the sub-directory (eg, `lib1`) and processes (serially) all of the files in said sub-directory; at the top/parent level you call the function 4 times, once for each sub-directory, making sure to push the function call into the background (eg, `function lib1 &`), and then have the parent `wait` for all 4 function calls to complete – markp-fuso Feb 28 '22 at 00:16
  • [Shellcheck](https://www.shellcheck.net/) finds several problems with the code. The most serious of them is that the `f_out`, `g_out`, `h_out`, and `i_out` variables are not initialized. All `--out` arguments are just `.png`. I suggest you provide working code that processes one file at a time and somebody might be able to help you optimize it to process 4 files at a time. For instance, does `for f in lib1/*.png lib2/*.png lib3/*.png lib4/*.png; do ./MySoftyware --input "$f" --out "${f%.png}_out.png"; done` do what you want? If not, why not? – pjh Feb 28 '22 at 02:45
  • 1
    GNU Parallel is a great tool for parallelising jobs, and you can use `--max-procs=4` to run at most four processes in parallel. – l0b0 Feb 28 '22 at 02:58
  • Also see [ProcessManagement - Greg's Wiki](https://mywiki.wooledge.org/ProcessManagement), particularly the "I want to process a bunch of files in parallel, and when one finishes, I want to start the next. And I want to make sure there are exactly 5 jobs running at a time." section. – pjh Feb 28 '22 at 10:38
  • @pjh leave $f_out.png instead consider it out.png as output. – Ravi Saini Mar 01 '22 at 00:46
  • @l0b0 will GNU Parallel work in High-performance Clusters(HPCs) ?. I am to so use to GNU Parallel – Ravi Saini Mar 01 '22 at 00:47
  • I've no idea what your particular setup is Ravi; I'd suggest looking at the documentation. – l0b0 Mar 01 '22 at 00:58

0 Answers0