7

I need a bash script to run some jobs in the background, three jobs at a time.

I know can do this in the following way, and for illustration, I will assume the number of jobs is 6:

./j1 &
./j2 &
./j3 &
wait
./j4 &
./j5 &
./j6 &
wait

However, this way, if, for example, j2 takes a lot longer to run that j1 and j3, then, I will be stuck with only one background job running for a long time.

The alternative (which is what I want) is that whenever one job is completed, bash should start the next job in the queue so that a rate of 3 jobs at any given time is maintained. Is it possible to write a bash script to implement this alternative, possibly using a loop? Please note that I need to run far more jobs, and I expect this alternative method to save me a lot of time.

Here is my draft of the script, which I hope you can help me to verify its correctness and improve it, as I'm new to scripting in bash. The ideas in this script are taken and modified from here, here, and here):

for i in $(seq 6)
do
   # wait here if the number of jobs is 3 (or more)
   while (( (( $(jobs -p | wc -l) )) >= 3 )) 
   do 
      sleep 5      # check again after 5 seconds
   done

   jobs -x ./j$i &
done
wait

IMHO, I think this script does the required behavior. However, I need to know -from bash experts- if I'm doing something wrong or if there is a better way of implementing this idea.

Thank you very much.

Community
  • 1
  • 1
  • Which version of bash, *specifically*? – Charles Duffy Feb 13 '17 at 18:43
  • BTW, `seq` is generally bad form. It's not specified by POSIX and also not built into bash, so there's no particular reason to believe it'll be present on a given system. Use a [C-style `for` loop](http://wiki.bash-hackers.org/syntax/ccmd/c_for) instead – Charles Duffy Feb 13 '17 at 18:44
  • ...to be specific as to why I asked about the version -- modern bash has a `wait -n` flag that waits for only one job to exit. – Charles Duffy Feb 13 '17 at 18:45
  • 1
    (Personally, I'm wary of relying on big chunks of perl, so I use the smaller, simpler and admittedly-less-capable `xargs -P` rather than parallel. That said, positions do differ on that count). – Charles Duffy Feb 13 '17 at 18:46
  • Charles: GNU bash, version 4.3.46(1)-release (x86_64-pc-linux-gnu) – user8420488483439 Feb 13 '17 at 18:47
  • 1
    Then you've got `wait -n`, making your life very easy. – Charles Duffy Feb 13 '17 at 18:47
  • @CharlesDuffy does that help though? You don't know which job is going to finish first, you might end up waiting for a job to finish while other jobs have already finished. You want a poll, or a wait with a timeout. – SpoonMeiser Feb 13 '17 at 18:48
  • @SpoonMeiser, it waits for *any* job to exit, not a specific one. That's the whole purpose of having `wait -n`, as opposed to `wait "$somepid"` – Charles Duffy Feb 13 '17 at 18:48
  • 1
    chepner: parallel is not installed in my system, and unfortunately, I don't have permissions to install packages. – user8420488483439 Feb 13 '17 at 18:49
  • Yeah, I misread the docs. Ignore me. – SpoonMeiser Feb 13 '17 at 18:50
  • @CharlesDuffy Can you please elaborate how to use this wait -n in my case? Maybe how to use it in a script? – user8420488483439 Feb 13 '17 at 18:52
  • BTW, folks interested in this question should probably read [ProcessManagement](http://mywiki.wooledge.org/ProcessManagement). – Charles Duffy Feb 13 '17 at 19:06
  • 2
    You don't need to install any packages to run `parallel` - it is just a Perl script (like one you might write yourself) and IMHO it is almost certainly the best, and simplest, way to run your jobs - especially if the duration varies wildly. – Mark Setchell Feb 13 '17 at 19:12

4 Answers4

9

With GNU xargs:

printf '%s\0' j{1..6} | xargs -0 -n1 -P3 sh -c './"$1"' _

With bash (4.x) builtins:

max_jobs=3; cur_jobs=0
for ((i=0; i<6; i++)); do
  # If true, wait until the next background job finishes to continue.
  ((cur_jobs >= max_jobs)) && wait -n
  # Increment the current number of jobs running.
  ./j"$i" & ((++cur_jobs))
done
wait

Note that the approach relying on builtins has some corner cases -- if you have multiple jobs exiting at the exact same time, a single wait -n can reap several of them, thus effectively consuming multiple slots. If we wanted to be more robust, we might end up with something like the following:

max_jobs=3
declare -A cur_jobs=( ) # build an associative array w/ PIDs of jobs we started
for ((i=0; i<6; i++)); do
  if (( ${#cur_jobs[@]} >= max_jobs )); then
    wait -n # wait for at least one job to exit
    # ...and then remove any jobs that aren't running from the table
    for pid in "${!cur_jobs[@]}"; do
      kill -0 "$pid" 2>/dev/null && unset cur_jobs[$pid]
    done
  fi
  ./j"$i" & cur_jobs[$!]=1
done
wait

...which is obviously a lot of work, and still has a minor race. Consider using xargs -P instead. :)

rileymcdowell
  • 590
  • 1
  • 4
  • 15
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Your first `bash` loop never decrements `curr_jobs`, nor can it effectively. – chepner Feb 13 '17 at 19:09
  • @chepner, it doesn't need to do a decrement. We're just making sure that after we've started three, we have one to reap for every additional one we start. If you care about the case when `wait -n` is actually reaping more than one job... that's what the second version is for. – Charles Duffy Feb 13 '17 at 19:09
4

Using GNU Parallel:

parallel -j3 ::: ./j{1..6}

Or if your shell does not do .. expansion (e.g. csh):

seq 6 | parallel -j3 ./j'{}'

If you think you cannot install GNU Parallel, please read http://oletange.blogspot.dk/2013/04/why-not-install-gnu-parallel.html and leave a comment on why you cannot install it.

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
0

Maybe this could assist..

Sample usecase: run 'sleep 20' 30 times, just as an example. It could be any job or another script. Our control logic is to keep checking whether "how many already fired?" is less than or equal to "max processes defined", inside a while loop. If not, fire one and if yes, sleep .5 seconds.

Script output: In the below snip, it is observed that now we have 30 'sleep 20' commands running in the background, as we configured max=30.

%_Host@User> ps -ef|grep 'sleep 20'|grep -v grep|wc -l
30
%_Host@User>

Change value of no. of jobs at runtime: Script has a param "max", which takes value from a file "max.txt"(max=$(cat max.txt)) and then applies it in each iteration of the while loop. As seen below, we changed it to 45 and now we have 45 'sleep 20' commands running in the background. You can put the main script in background and just keep changing the max value inside "max.txt" to control.

%_Host@User> cat > max.txt
45
^C
%_Host@User> ps -ef|grep 'sleep 20'|grep -v grep|wc -l
45
%_Host@User>

Script:

#!/bin/bash
#---------------------------------------------------------------------#
proc='sleep 20' # Your process or script or anything..
max=$(cat max.txt)  # configure how many jobs do you want
curr=0
#---------------------------------------------------------------------#
while true
do
  curr=$(ps -ef|grep "$proc"|grep -v grep|wc -l); max=$(cat max.txt)
  while [[ $curr -lt $max ]]
        do
    ${proc} &        # Sending process to background.
    max=$(cat max.txt) # After sending one job, again calculate max and curr
    curr=$(ps -ef|grep "$proc"|grep -v grep|wc -l)
  done
  sleep .5    # sleep .5 seconds if reached max jobs.
done
#---------------------------------------------------------------------#

Let us know if it was any useful.

User9102d82
  • 1,172
  • 9
  • 19
  • `${proc} &` has all the bugs described in [BashFAQ #50](http://mywiki.wooledge.org/BashFAQ/050). – Charles Duffy Feb 13 '17 at 20:42
  • ...there are a number of other issues here as well. Many processes spawn subshells -- if you spawn a script that runs two subshells, then every directly-spawned copy will count as three in your `wc -l`. The `ps` approach counts not just direct children of this job, but all processes across the entire system. Not all jobs match themselves as a regex -- if you run `find '/incoming/[a-zA-Z]*.d' -mindepth 1 -maxdepth 1 -exec my-process`, then the `[a-zA-Z]` in the grep won't match the `[` character in the associated position in the ps output. – Charles Duffy Feb 13 '17 at 20:47
  • Thanks Charles. I am gonna check what you said, hopefully learn something new. Cheers! – User9102d82 Feb 13 '17 at 20:59
0

This is how I do it:

  1. Enable jobs in our script:

    set -m
    
  2. Create a trap which kills all jobs if the script is interrupted:

    trap 'jobs -p | xargs kill 2>/dev/null;' EXIT
    
  3. Use a loop to start a maximum of 3 jobs in background

    for i in $(seq 6); do
      while [[ $(jobs | wc -l) -ge 3 ]]; do
        sleep 5
      done
      ./j"$i" &
    done
    
  4. Finally bring our background jobs back to the foreground:

    while fg >/dev/null 2>&1; do
      echo -n "" # output nothing
    done
    

Because of the last part the script does not exit as long jobs are running and it avoids that jobs get killed by trap.

mgutt
  • 5,867
  • 2
  • 50
  • 77