16

I recently posted a question asking if it was possible to prevent PID's from being re-used.

So far the answer appears to be no. (Which is fine.)

However, the user Diego Torres Milano added an answer to that question, and my question here is in regards to that answer.

Diego answered,

If you are afraid of reusing PID's, which won't happen if you wait as other answers explain, you can use

echo 4194303 > /proc/sys/kernel/pid_max

to decrease your fear ;-)

I don't actually understand why Diego has used the number 4194303 here, but that's another question.

My understanding was that I had a problem with the following code:

for pid in "${PIDS[@]}"
do
    wait $pid
done

The problem being that I have multiple PIDs in an array, and that the for loop will run the wait command sequentially with each PID in the array, however I cannot predict that the processes will finish in the same order that their PIDs are stored in this array.

ie; the following could happen:

  • Start waiting for PID in array index 0
  • Process with PID in index 1 of array terminates
  • New job(s) run on system, resulting in PID which is stored in index 1 of PID array being reused for another process
  • wait terminates as PID in array index 0 exits
  • Start waiting for PID in array index 0, except this is now a different process and we have no idea what it is
  • The process which was run which re-used the PID which wait is currently waiting for never terminates. Perhaps it is the PID of a mail server or something which a system admin has started.
  • wait keeps waiting until the next serious linux bug is found and the system is rebooted or there is a power outage

Diego said:

which won't happen if you wait as other answers explain

ie; that the situation I have described above cannot happen.

Is Diego correct?

  • If so, why can the situation I discribed above not occur?

Or is Diego not correct?

  • If so, well, then I post a new question later today...

Additional notes

It has occured to me that this question might be confusing, unless you are aware that the PID's are PID's of processes launched in the background. ie;

my_function &
PID="$!"
PIDS+=($PID)
Community
  • 1
  • 1
FreelanceConsultant
  • 13,167
  • 27
  • 115
  • 225
  • 3
    Have you read the bash documentation for wait? It accepts more than one pid. – Bjorn A. Nov 02 '16 at 10:30
  • @BjornA. Yes I have. The documentation says it accepts 1 PID only. – FreelanceConsultant Nov 02 '16 at 10:31
  • 2
    Quote from (my) `man bash` : `wait [-n] [n ...] Wait for each specified child process` – Aaron Nov 02 '16 at 10:32
  • 2
    wait [-n] [n ...] Wait for each specified child process and return its termination status. Each n may be a process ID or a job specification; if a job spec is given, all processes in that job's pipeline are waited for. If n is not given, all currently active child processes are waited for, and the return status is zero. If the -n option is supplied, wait waits for any job to terminate and returns its exit status. If n specifies a non-existent process or job, the return status is 127. Otherwise, the return status is the exit status of the last process or job waited for. – Bjorn A. Nov 02 '16 at 10:33
  • Okay thanks I misunderstood from reading this webpage: http://www.tldp.org/LDP/abs/html/x9644.html – FreelanceConsultant Nov 02 '16 at 10:33
  • @Aaron Can I use an array as an argument to `wait`? – FreelanceConsultant Nov 02 '16 at 10:34
  • No, it's the shell parsing ability that comes into action here, you have to provide an IFS-separated list of PIDs. However, using `${array[@]}` could produce such a list. – Aaron Nov 02 '16 at 10:35
  • Keep in mind that you cannot wait for a pid that is not a child of the current script/shell process - thus it's not possible to wait for a random pid that got reused elsewhere. The bash wait command also says: "If n is not given, all currently active child processes are waited for" , i.e. you can call `wait` with no arguments. – nos Nov 02 '16 at 10:35
  • Also consider using `wait` without arguments as proposed in your other question if you want to wait on all the process you've spawned from your main script. It would eliminate the need to maintain a PID list. – Aaron Nov 02 '16 at 10:38
  • Most of this question has already been answered in your previous question: https://stackoverflow.com/a/40360239/1640661 – Anthony Geoghegan Nov 02 '16 at 10:42
  • @nos that's fine I only intend to wait for child processes – FreelanceConsultant Nov 02 '16 at 10:47
  • @Aaron Thanks I wasn't aware of that – FreelanceConsultant Nov 02 '16 at 10:47
  • @AnthonyGeoghegan If it was then it wasn't clear to me – FreelanceConsultant Nov 02 '16 at 10:47
  • @Aaron So if I convert my PIDs to a space separated string of PIDs, ie; a string, then that will work? – FreelanceConsultant Nov 02 '16 at 10:48
  • Yes, you should be able to wait for a specific subset of your child processes with a single wait, avoiding to use a loop – Aaron Nov 02 '16 at 10:55
  • I'm voting to close this question as off-topic because it's turning into a chat forum in the comments. Maybe this discussion could be moved to [chat](http://chat.stackoverflow.com/)? – larsks Nov 02 '16 at 11:32
  • @user3728501, ...btw, the ABS is generally poorly-regarded as a reference -- it's infrequently updated and often showcases bad practices in examples. Consider the bash-hackers wiki (its page on `wait` is [here](http://wiki.bash-hackers.org/commands/builtin/wait)), or the Wooledge wiki [SignalTrap](http://mywiki.wooledge.org/SignalTrap) page. – Charles Duffy Jan 11 '17 at 05:01

4 Answers4

46

Let's go through your options.

Wait for all background jobs, unconditionally

for i in 1 2 3 4 5; do
    cmd &
done
wait

This has the benefit of being simple, but you can't keep your machine busy. If you want to start new jobs as old ones complete, you can't. You machine gets less and less utilized until all the background jobs complete, at which point you can start a new batch of jobs.

Related is the ability to wait for a subset of jobs by passing multiple arguments to wait:

unrelated_job &
for i in 1 2 3 4 5; do
  cmd & pids+=($!)
done
wait "${pids[@]}"   # Does not wait for unrelated_job, though

Wait for individual jobs in arbitrary order

for i in 1 2 3 4 5; do
   cmd & pids+=($!)
done

for pid in "${pids[@]}"; do
   wait "$pid"
   # do something when a job completes
done

This has the benefit of letting you do work after a job completes, but still has the problem that jobs other than $pid might complete first, leaving your machine underutilized until $pid actually completes. You do, however, still get the exit status for each individual job, even if it completes before you actually wait for it.

Wait for the next job to complete (bash 4.3 or later)

for i in 1 2 3 4 5; do
   cmd & pids+=($!)
done

for pid in "${pids[@]}"; do
   wait -n
   # do something when a job completes
done

Here, you can wait until a job completes, which means you can keep your machine as busy as possible. The only problem is, you don't necessarily know which job completed, without using jobs to get the list of active processes and comparing it to pids.

Other options?

The shell by itself is not an ideal platform for doing job distribution, which is why there are a multitude of programs designed for managing batch jobs: xargs, parallel, slurm, qsub, etc.

chepner
  • 497,756
  • 71
  • 530
  • 681
2

Starting with Bash 5.1, there is now an additional way of waiting for and handling multiple background jobs thanks to the introduction of wait -p.

Here's an example:

#!/usr/bin/env bash
for ((i=0; i < 10; i++)); do
    secs=$((RANDOM % 10)); code=$((RANDOM % 256))
    (sleep ${secs}; exit ${code}) &
    echo "Started background job (pid: $!, sleep: ${secs}, code: ${code})"
done

while true; do
    wait -n -p pid; code=$?
    [[ -z "${pid}" ]] && break
    echo "Background job ${pid} finished with code ${code}"
done

The novelty here is that you now know exactly which one of the background jobs finished.

Fonic
  • 2,625
  • 23
  • 20
0

This is old, but the scenario presented where a deferred wait waits for some random unrelated process due to pid collision hasn't been directly addressed.

It's not possible at the kernel level. The way it works there is that prior to the parent process calling wait(2)¹, the child process still exists. Because the child still exists, linux will run out of pids rather than reuse it. This manifests at times with so called zombie or "defunct" processes - these are children which have exited but have yet to be "reaped" by their parent.

Now, at the shell level you don't have to call wait(1)¹ for child processes to be reaped - bash does this automatically. I haven't confirmed, but when you run wait $pid for a child pid which exited long ago, I would wager bash realises it has already reaped that child and returns the information immediately rather than waiting for anything.

¹ the wait(N) notation is a convention used to disambiguate between API layers - N refers to the section of the manual a command/function is located in. In this case we have:

  • wait(2): the syscall - see man 2 wait
  • wait(1): the shell command - see man 1 wait or help wait

If you want to know what lives in each manual section, try man N intro.

sqweek
  • 1,129
  • 9
  • 12
0

Give this a try. It works without having to memoize the backgrounded PIDs (by using jobs -p), preserves exit codes, and exits early if one task fails.

while (($(jobs -p | wc -l) > 0)); do
  if wait -n; then
    :
  else
    ret=$?
    jobs -p | xargs -n1 kill 2>/dev/null
    wait
    exit $ret
  fi
done

The if wait -n; then : ensures that you can still use ERREXIT (! masks the exit code).

A oneliner that skips the early exit would be

while read -r pid; do wait "$pid"; done < <(jobs -p)

As to your concerns:

  • jobs only lists background processes under the current process.
  • wait cannot wait on PIDs that aren't owned by the current process.
andsens
  • 6,716
  • 4
  • 30
  • 26