82

To maximize CPU usage (I run things on a Debian Lenny in EC2) I have a simple script to launch jobs in parallel:

#!/bin/bash

for i in apache-200901*.log; do echo "Processing $i ..."; do_something_important; done &
for i in apache-200902*.log; do echo "Processing $i ..."; do_something_important; done &
for i in apache-200903*.log; do echo "Processing $i ..."; do_something_important; done &
for i in apache-200904*.log; do echo "Processing $i ..."; do_something_important; done &
...

I'm quite satisfied with this working solution; however, I couldn't figure out how to write further code to be executed only once ALL of the loops have been completed.

Is there a way to do this?

Community
  • 1
  • 1
mark
  • 6,308
  • 8
  • 46
  • 57

5 Answers5

130

There's a bash builtin command for that.

wait [n ...]
      Wait for each specified process and return its termination  sta‐
      tus.   Each  n  may be a process ID or a job specification; if a
      job spec is given, all processes  in  that  job’s  pipeline  are
      waited  for.  If n is not given, all currently active child pro‐
      cesses are waited for, and the return  status  is  zero.   If  n
      specifies  a  non-existent  process or job, the return status is
      127.  Otherwise, the return status is the  exit  status  of  the
      last process or job waited for.
muru
  • 4,723
  • 1
  • 34
  • 78
eduffy
  • 39,140
  • 13
  • 95
  • 92
  • 51
    hint use ```wait $(jobs -p)``` to wait for the newly created jobs. – lambacck Apr 27 '16 at 14:31
  • 17
    @lambacck isn't `wait` with no argument equivalent? – Olivier Lalonde May 04 '17 at 06:25
  • 7
    Or use `wait $(jobs -rp)` if you have other jobs backgrounded (such as when you suspended vim with Ctrl+Z): the additional `-r` flag filters out *running* jobs. – Luc May 05 '18 at 22:33
  • 1
    I know this is a Bash question, but in case anyone wants to know the Zsh equivalent, here it is. Zsh `jobs` doesn't have an equivalent `-p` option, so you can use AWK to parse the output. Something like this should work: `wait $( jobs -r | awk '{ gsub("[\\[\\]]", "", "%" $1) ; print "%"$1 ; }' )`. – shadowtalker Aug 09 '23 at 13:26
45

Using GNU Parallel will make your script even shorter and possibly more efficient:

parallel 'echo "Processing "{}" ..."; do_something_important {}' ::: apache-*.log

This will run one job per CPU core and continue to do that until all files are processed.

Your solution will basically split the jobs into groups before running. Here 32 jobs in 4 groups:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

To learn more:

Cœur
  • 37,241
  • 25
  • 195
  • 267
Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • 2
    Thank you for parallel! – Guilherme Salomé Oct 05 '18 at 21:35
  • 2
    This `parallel --citation` is a bit weird – andras.tim Jun 21 '20 at 20:45
  • 1
    @andras.tim This may help you: http://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt – Ole Tange Jun 21 '20 at 21:21
  • While this is good for CPU intensive tasks, wouldn't this be add more waste during jobs that involve lots of idle time (like ones making web requests) – b-rad15 Nov 29 '21 at 17:32
  • @b-rad15 If you need to have, say, 250 slow web requests running in parallel, you will waste a little CPU time. But since this CPU would be sitting idle anyway, you are unlikely to notice the loss. The overhead is ~10 ms CPU time per job - which is noticeable for very short jobs, but not a problem for longer running jobs. – Ole Tange Nov 29 '21 at 18:49
  • how to run it from `while read -u 10 p; do ./worker job & done 10< /opt/joblist.txt`? – eri Mar 02 '23 at 07:25
  • 1
    @eri `parallel ./worker job < /opt/joblist.txt`. Spend 15 minutes on reading chapter 1+2 of https://zenodo.org/record/1146014 Your command line will thank you for it. – Ole Tange Mar 02 '23 at 08:43
17

I had to do this recently and ended up with the following solution:

while true; do
  wait -n || {
    code="$?"
    ([[ $code = "127" ]] && exit 0 || exit "$code")
    break
  }
done;

Here's how it works:

wait -n exits as soon as one of the (potentially many) background jobs exits. It always evaluates to true and the loop goes on until:

  1. Exit code 127: the last background job successfully exited. In that case, we ignore the exit code and exit the sub-shell with code 0.
  2. Any of the background job failed. We just exit the sub-shell with that exit code.

With set -e, this will guarantee that the script will terminate early and pass through the exit code of any failed background job.

Olivier Lalonde
  • 19,423
  • 28
  • 76
  • 91
4

A minimal example with wait $(jobs -p):

  for i in {1..3}
  do
    (echo "process $i started" && sleep 5 && echo "process $i finished")&
  done  

  sleep 0.1 # For sequential output
  echo "Waiting for processes to finish" 
  wait $(jobs -p)
  echo "All processes finished"

Exemplary output:

process 1 started
process 2 started
process 3 started
Waiting for processes to finish
process 2 finished
process 1 finished
process 3 finished
All processes finished
schnatterer
  • 7,525
  • 7
  • 61
  • 80
1

This is my crude solution:

function run_task {
        cmd=$1
        output=$2
        concurency=$3
        if [ -f ${output}.done ]; then
                # experiment already run
                echo "Command already run: $cmd. Found output $output"
                return
        fi
        count=`jobs -p | wc -l`
        echo "New active task #$count:  $cmd > $output"
        $cmd > $output && touch $output.done &
        stop=$(($count >= $concurency))
        while [ $stop -eq 1 ]; do
                echo "Waiting for $count worker threads..."
                sleep 1
                count=`jobs -p | wc -l`
                stop=$(($count > $concurency))
        done
}

The idea is to use "jobs" to see how many children are active in the background and wait till this number drops (a child exits). Once a child exists, the next task can be started.

As you can see, there is also a bit of extra logic to avoid running the same experiments/commands multiple times. It does the job for me.. However, this logic could be either skipped or further improved (e.g., check for file creation timestamps, input parameters, etc.).

muru
  • 4,723
  • 1
  • 34
  • 78
Radu
  • 1,098
  • 1
  • 11
  • 22