Bash: limit the number of concurrent jobs?

Question

Is there an easy way to limit the number of concurrent jobs in bash? By that I mean making the & block when there are more then n concurrent jobs running in the background.

I know I can implement this with ps | grep -style tricks, but is there an easier way?

I think this question might help you: http://stackoverflow.com/questions/38160/parallelize-bash-script — Tom Ritter, Oct 08 '09 at 14:09
So, many convoluted answers, but no way to tell bash "maximum ten concurrent jobs!". I guess there isn't one then. Too bad, that would really be a nice feature. — static_rtti, Oct 08 '09 at 14:21

Ole Tange · Accepted Answer · 2021-07-08T19:05:53.720

37

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:

parallel gzip ::: *.log

which will run one gzip per CPU core until all logfiles are gzipped.

If it is part of a larger loop you can use sem instead:

for i in *.log ; do
    echo $i Do more stuff here
    sem -j+0 gzip $i ";" echo done
done
sem --wait

It will do the same, but give you a chance to do more stuff for each file.

If GNU Parallel is not packaged for your distribution you can install GNU Parallel simply by:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
   fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh

It will download, check signature, and do a personal installation if it cannot install globally.

Watch the intro videos for GNU Parallel to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

edited Jul 08 '21 at 19:05

answered May 19 '12 at 12:02

Ole Tange

31,768
5
86
104

2

This is amazing - the parallel command is great too, you don't even need to do the loop. – frabcus Apr 29 '14 at 19:38
The `:::` syntax is obsolescent, though there is an option to enable it for backwards compatibility which some distros enable by default (somewhat oddly, because then the examples in the manual won't work out of the box). – tripleee Mar 18 '16 at 19:27
2

@tripleee ::: has been supported since 2010722 and will be in the foreseeable future. Your installation may, however, be trying to emulate Tollef's parallel without telling you - which explains why you find it odd. Removing /etc/parallel/config should fix your issue. – Ole Tange Mar 19 '16 at 09:41
2

There’s a very annoying citation blurb in the output though. It doesn’t even make sense, unless one is pro-copyright… which would be strange for a GNU tool. → You can remove it by using the `--will-cite` argument for `sem`. What were they thinking? – Evi1M4chine Mar 22 '17 at 19:29
Any reason you do not run '--bibtex' or '--citation' one single time (as it suggests you do)? – Ole Tange Mar 22 '17 at 23:31
Responsible disclosure: @OleTange authored `parallel` (and its helper `sem`). P.S.: Thanks for an excellent program! – bishop Nov 03 '17 at 17:31
1

@Evi1M4chine This is pro-attribution, not pro-copyright. GNU is very much for pro-attribution with 'viral' licensing. Basically you used something that someone freely gives away with the condition you give them a little bit of credit. In this case the software was developed for scientific purposes and citations allow the author to keep track if they are making a difference and if they should get funding in the future to continue their work. The work is still freely available, even in source, and can be derived. – coderforlife Dec 31 '17 at 21:15
1

@coderforlife: The point is that "licensing" is a delusional concept, not in touch with reality. It requires everyone to adhere to the "license", even behind your back, and even without you ever knowing, when the possibility of passing information on arises. Which is actually even *physically* impossible, due to the possibility of it happening outside one’s light cone, making a causal connection impossible. But in practice, "licenses" are not enforceable, without a perfect TPM chip in every brain and device. Something that we all hope will never happen. – Evi1M4chine Jan 05 '18 at 01:35

score 29 · Answer 2 · edited Oct 08 '09 at 17:49

29

A small bash script could help you:

# content of script exec-async.sh
joblist=($(jobs -p))
while (( ${#joblist[*]} >= 3 ))
do
    sleep 1
    joblist=($(jobs -p))
done
$* &

If you call:

. exec-async.sh sleep 10

...four times, the first three calls will return immediately, the fourth call will block until there are less than three jobs running.

You need to start this script inside the current session by prefixing it with ., because jobs lists only the jobs of the current session.

The sleep inside is ugly, but I didn't find a way to wait for the first job that terminates.

edited Oct 08 '09 at 17:49

Dennis Williamson

346,391
90
374
439

answered Oct 08 '09 at 16:04

tangens

39,095
19
120
139

2

the child processes will become zombies. somewhere a wait shoud occur. – torbatamas Aug 01 '17 at 16:44
1

A decade later... Bash's `wait` command now has `-n`. If you run `wait -n`, it will wait for the *n*ext job (in this session) to exit regardless of PID. :) – dannysauer Aug 02 '21 at 20:06

paxdiablo · Answer 3 · 2020-12-31T22:47:44.363

The following script shows a way to do this with functions. You can either put the bgxupdate() and bgxlimit() functions in your script, or have them in a separate file which is sourced from your script with:

. /path/to/bgx.sh

It has the advantage that you can maintain multiple groups of processes independently (you can run, for example, one group with a limit of 10 and another totally separate group with a limit of 3).

It uses the Bash built-in jobs to get a list of sub-processes but maintains them in individual variables. In the loop at the bottom, you can see how to call the bgxlimit() function:

Set up an empty group variable.
Transfer that to bgxgrp.
Call bgxlimit() with the limit and command you want to run.
Transfer the new group back to your group variable.

Of course, if you only have one group, just use bgxgrp variable directly rather than transferring in and out.

#!/bin/bash

# bgxupdate - update active processes in a group.
#   Works by transferring each process to new group
#   if it is still active.
# in:  bgxgrp - current group of processes.
# out: bgxgrp - new group of processes.
# out: bgxcount - number of processes in new group.

bgxupdate() {
    bgxoldgrp=${bgxgrp}
    bgxgrp=""
    ((bgxcount = 0))
    bgxjobs=" $(jobs -pr | tr '\n' ' ')"
    for bgxpid in ${bgxoldgrp} ; do
        echo "${bgxjobs}" | grep " ${bgxpid} " >/dev/null 2>&1
        if [[ $? -eq 0 ]]; then
            bgxgrp="${bgxgrp} ${bgxpid}"
            ((bgxcount++))
        fi
    done
}

# bgxlimit - start a sub-process with a limit.

#   Loops, calling bgxupdate until there is a free
#   slot to run another sub-process. Then runs it
#   an updates the process group.
# in:  $1     - the limit on processes.
# in:  $2+    - the command to run for new process.
# in:  bgxgrp - the current group of processes.
# out: bgxgrp - new group of processes

bgxlimit() {
    bgxmax=$1; shift
    bgxupdate
    while [[ ${bgxcount} -ge ${bgxmax} ]]; do
        sleep 1
        bgxupdate
    done
    if [[ "$1" != "-" ]]; then
        $* &
        bgxgrp="${bgxgrp} $!"
    fi
}

# Test program, create group and run 6 sleeps with
#   limit of 3.

group1=""
echo 0 $(date | awk '{print $4}') '[' ${group1} ']'
echo
for i in 1 2 3 4 5 6; do
    bgxgrp=${group1}; bgxlimit 3 sleep ${i}0; group1=${bgxgrp}
    echo ${i} $(date | awk '{print $4}') '[' ${group1} ']'
done

# Wait until all others are finished.

echo
bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
while [[ ${bgxcount} -ne 0 ]]; do
    oldcount=${bgxcount}
    while [[ ${oldcount} -eq ${bgxcount} ]]; do
        sleep 1
        bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
    done
    echo 9 $(date | awk '{print $4}') '[' ${group1} ']'
done

Here’s a sample run, with blank lines inserted to clearly delineate different time points:

0 12:38:00 [ ]
1 12:38:00 [ 3368 ]
2 12:38:00 [ 3368 5880 ]
3 12:38:00 [ 3368 5880 2524 ]

4 12:38:10 [ 5880 2524 1560 ]

5 12:38:20 [ 2524 1560 5032 ]

6 12:38:30 [ 1560 5032 5212 ]

9 12:38:50 [ 5032 5212 ]

9 12:39:10 [ 5212 ]

9 12:39:30 [ ]

The whole thing starts at 12:38:00 (time t = 0) and, as you can see, the first three processes run immediately.
Each process sleeps for 10n seconds and the fourth process doesn’t start until the first exits (at time t = 10). You can see that process 3368 has disappeared from the list before 1560 is added.
Similarly, the fifth process 5032 starts when 5880 (the second) exits at time t = 20.
And finally, the sixth process 5212 starts when 2524 (the third) exits at time t = 30.
Then the rundown begins, the fourth process exits at time t = 50 (started at 10 with 40 duration).
The fifth exits at time t = 70 (started at 20 with 50 duration).
Finally, the sixth exits at time t = 90 (started at 30 with 60 duration).

Or, if you prefer it in a more graphical time-line form:

Process:  1  2  3  4  5  6 
--------  -  -  -  -  -  -
12:38:00  ^  ^  ^            1/2/3 start together.
12:38:10  v  |  |  ^         4 starts when 1 done.
12:38:20     v  |  |  ^      5 starts when 2 done.
12:38:30        v  |  |  ^   6 starts when 3 done.
12:38:40           |  |  |
12:38:50           v  |  |   4 ends.
12:39:00              |  |
12:39:10              v  |   5 ends.
12:39:20                 |
12:39:30                 v   6 ends.

Scarabeetle · Answer 4 · 2015-05-22T12:17:51.043

22

Here's the shortest way:

waitforjobs() {
    while test $(jobs -p | wc -w) -ge "$1"; do wait -n; done
}

Call this function before forking off any new job:

waitforjobs 10
run_another_job &

To have as many background jobs as cores on the machine, use $(nproc) instead of a fixed number like 10.

edited May 22 '15 at 12:17

answered May 22 '15 at 12:01

Scarabeetle

525
5
6

Awesome, but for bash version >= 4 – user3769065 Oct 09 '15 at 23:20
2

This has a race condition -- if one of the jobs finishes before you get to the 'wait', then you might be in a position where you could run another job, but have to wait until 'wait' catches another job. – Chris Jefferson Jun 02 '17 at 16:34
1

How do you terminate such script when it's running? It completely ignores `Ctrl+C` interrupts. – Slava Fomin II Apr 19 '20 at 11:07

Aaron McDaid · Answer 5 · 2015-02-24T07:45:33.820

Assuming you'd like to write code like this:

for x in $(seq 1 100); do     # 100 things we want to put into the background.
    max_bg_procs 5            # Define the limit. See below.
    your_intensive_job &
done

Where max_bg_procs should be put in your .bashrc:

function max_bg_procs {
    if [[ $# -eq 0 ]] ; then
            echo "Usage: max_bg_procs NUM_PROCS.  Will wait until the number of background (&)"
            echo "           bash processes (as determined by 'jobs -pr') falls below NUM_PROCS"
            return
    fi
    local max_number=$((0 + ${1:-0}))
    while true; do
            local current_number=$(jobs -pr | wc -l)
            if [[ $current_number -lt $max_number ]]; then
                    break
            fi
            sleep 1
    done
}

I found that I needed to use "jobs -pr" rather than simply "jobs -p" otherwise it never finished the last job, and wouldn't proceed past the first job if I set the limit to 1 job at a time. — BenjaminBallard, Feb 23 '15 at 22:07

score 7 · Answer 6 · answered Oct 09 '15 at 22:37

The following function (developed from tangens answer above, either copy into script or source from file):

job_limit () {
    # Test for single positive integer input
    if (( $# == 1 )) && [[ $1 =~ ^[1-9][0-9]*$ ]]
    then

        # Check number of running jobs
        joblist=($(jobs -rp))
        while (( ${#joblist[*]} >= $1 ))
        do

            # Wait for any job to finish
            command='wait '${joblist[0]}
            for job in ${joblist[@]:1}
            do
                command+=' || wait '$job
            done
            eval $command
            joblist=($(jobs -rp))
        done
   fi
}

1) Only requires inserting a single line to limit an existing loop

while :
do
    task &
    job_limit `nproc`
done

2) Waits on completion of existing background tasks rather than polling, increasing efficiency for fast tasks

If there are 10 jobs started, then job_limit waits for all 10 to finish, before starting another 10 jobs, right? — Tomas M, Jun 22 '19 at 14:51
No, jobs are started one by one. The job_limit function blocks when running jobs hit the limit. There is no batching. — user3769065, Jul 02 '19 at 21:42

score 6 · Answer 7 · answered Jul 21 '11 at 09:13

6

This might be good enough for most purposes, but is not optimal.

#!/bin/bash

n=0
maxjobs=10

for i in *.m4a ; do
    # ( DO SOMETHING ) &

    # limit jobs
    if (( $(($((++n)) % $maxjobs)) == 0 )) ; then
        wait # wait until all have finished (not optimal, but most times good enough)
        echo $n wait
    fi
done

answered Jul 21 '11 at 09:13

cat

2,871
1
23
28

What's not optimal about it? – naught101 Sep 16 '16 at 05:32
5

You start 10 jobs then wait for all 10 to finish before starting another 10 jobs. Some of the time you have only 1 job running instead of 10. This is not good if you have slow and fast jobs mixed together. – cat Sep 21 '16 at 08:49

score 4 · Answer 8 · answered Nov 05 '09 at 12:28

If you're willing to do this outside of pure bash, you should look into a job queuing system.

For instance, there's GNU queue or PBS. And for PBS, you might want to look into Maui for configuration.

Both systems will require some configuration, but it's entirely possible to allow a specific number of jobs to run at once, only starting newly queued jobs when a running job finishes. Typically, these job queuing systems would be used on supercomputing clusters, where you would want to allocate a specific amount of memory or computing time to any given batch job; however, there's no reason you can't use one of these on a single desktop computer without regard for compute time or memory limits.

score 2 · Answer 9 · answered Jun 22 '19 at 15:21

It is hard to do without wait -n (for example, shell in busybox does not support it). So here is a workaround, it is not optimal because it calls 'jobs' and 'wc' commands 10x per second. You can reduce the calls to 1x per second for example, if you don't mind waiting a bit longer for each job to complete.

# $1 = maximum concurent jobs
#
limit_jobs()
{
   while true; do
      if [ "$(jobs -p | wc -l)" -lt "$1" ]; then break; fi
      usleep 100000
   done
}

# and now start some tasks:

task &
limit_jobs 2
task &
limit_jobs 2
task &
limit_jobs 2
task &
limit_jobs 2
wait

Tuttle · Answer 10 · 2015-03-12T12:37:53.083

On Linux I use this to limit the bash jobs to the number of available CPUs (possibly overriden by setting the CPU_NUMBER).

[ "$CPU_NUMBER" ] || CPU_NUMBER="`nproc 2>/dev/null || echo 1`"

while [ "$1" ]; do
    {
        do something
        with $1
        in parallel

        echo "[$# items left] $1 done"
    } &

    while true; do
        # load the PIDs of all child processes to the array
        joblist=(`jobs -p`)
        if [ ${#joblist[*]} -ge "$CPU_NUMBER" ]; then
            # when the job limit is reached, wait for *single* job to finish
            wait -n
        else
            # stop checking when we're below the limit
            break
        fi
    done
    # it's great we executed zero external commands to check!

    shift
done

# wait for all currently active child processes
wait

score 1 · Answer 11 · answered Jul 31 '21 at 13:08

Wait command, -n option, waits for the next job to terminate.

maxjobs=10
# wait for the amount of processes less to $maxjobs
jobIds=($(jobs -p))
len=${#jobIds[@]}
while [ $len -ge $maxjobs ]; do
    # Wait until one job is finished
    wait -n $jobIds
    jobIds=($(jobs -p))
    len=${#jobIds[@]}
done

score 0 · Answer 12 · answered Oct 09 '09 at 03:12

0

Have you considered starting ten long-running listener processes and communicating with them via named pipes?

answered Oct 09 '09 at 03:12

Steven Huwig

20,015
9
55
79

score 0 · Answer 13 · answered Nov 05 '09 at 11:10

0

you can use ulimit -u see http://ss64.com/bash/ulimit.html

answered Nov 05 '09 at 11:10

Shay

1,245
7
14

1

The only problem with this is it will cause the processes to die rather than block and wait which is the desired behavior. – Benj Nov 05 '09 at 11:17
1

This solution is dangerous and hard to control. Since my shell scripts tend to contain a lot of subshell expansion and piping, each line typically needs 4+ processes. When you set the ulimit of the entire process, it not just limits how many jobs can execute, it also limits things necessary for the execution of the rest of the script, causing things to block/fail in an unpredictable way. – amphetamachine Mar 22 '10 at 02:36
but is the only way you can enforce the limit to users – Diego Torres Milano Dec 31 '20 at 19:38

score 0 · Answer 14 · answered Nov 01 '19 at 18:01

0

Bash mostly processes files line by line. So you cap split input file input files by N lines then simple pattern is applicable:

mkdir tmp ; pushd tmp ; split -l 50 ../mainfile.txt
for file in * ; do 
   while read a b c ; do curl -s http://$a/$b/$c <$file &
   done ; wait ; done
popd ; rm -rf tmp;

answered Nov 01 '19 at 18:01

Daniil Iaitskov

5,525
8
39
49

If I read this correctly, this runs a batch of 50, then waits until all of them are done before starting another batch. Ideally we should find a way to have 50 concurrent processes running at all times. (GNU `parallel` does that easily, `xargs` with some steady persuading.) – tripleee Oct 05 '20 at 17:52

Bash: limit the number of concurrent jobs?

14 Answers14

Linked

Related