30

I write a script to get data from HDFS parallel, then I wait these child processes in a for loop, but sometimes it returns pid is not a child of this shell. sometimes, it works well. It's so puzzled. I use jobs -l to show all the jobs run in the background. I am sure these pid is the child process of the shell process, and I use ps aux to make sure these pids is note assign to other process. Here is my script.

PID=()
FILE=()
let serial=0

while read index_tar
do
        echo $index_tar | grep index > /dev/null 2>&1

        if [[ $? -ne 0 ]]
        then
                continue
        fi

        suffix=`printf '%03d' $serial`
        mkdir input/output_$suffix
        $HADOOP_HOME/bin/hadoop fs -cat $index_tar | tar zxf - -C input/output_$suffix \
                && mv input/output_$suffix/index_* input/output_$suffix/index &

        PID[$serial]=$!
        FILE[$serial]=$index_tar

        let serial++

done < file.list

for((i=0;i<$serial;i++))
do
        wait ${PID[$i]}

        if [[ $? -ne 0 ]]
        then
                LOG "get ${FILE[$i]} failed, PID:${PID[$i]}"
                exit -1
        else
                LOG "get ${FILE[$i]} success, PID:${PID[$i]}"
        fi
done
Benjamin Loison
  • 3,782
  • 4
  • 16
  • 33
henshao
  • 371
  • 1
  • 4
  • 6
  • A good question, I am getting exactly the same error. I launched 96 background jobs and waited for them. 4 of the 96 gave me the "pid 28991 (this number is the random child PID as an example) is not a child of this shell". I assume that the wait command is not foolproof. I will do some digging. – Kemin Zhou Nov 30 '18 at 02:08

3 Answers3

31

Just find the process id of the process you want to wait for and replace that with 12345 in below script. Further changes can be made as per your requirement.

#!/bin/sh
PID=12345
while [ -e /proc/$PID ]
do
    echo "Process: $PID is still running" >> /home/parv/waitAndRun.log
    sleep .6
done
echo "Process $PID has finished" >> /home/parv/waitAndRun.log

/usr/bin/waitingScript.sh

http://iamparv.blogspot.in/2013/10/unix-wait-for-running-process-not-child.html

pylover
  • 7,670
  • 8
  • 51
  • 73
Parvinder Singh
  • 475
  • 4
  • 5
7

Either your while loop or the for loop runs in a subshell, which is why you cannot await a child of the (parent, outer) shell.

Edit this might happen if the while loop or for loop is actually

(a) in a {...} block (b) participating in a piper (e.g. for....done|somepipe)

sehe
  • 374,641
  • 47
  • 450
  • 633
  • You could check this line of thinking, nonetheless, e.g. printing $BASHPID, $$, $BASH_SUBSHELL in both locations (and at the toplevel of your script!) – sehe Nov 08 '11 at 10:25
5

If you're running this in a container of some sort, the condition apparently can be caused by a bug in bash that is easier to encounter in a containerized envrionment.

From my reading of the bash source (specifically see comments around RECYCLES_PIDS and CHILD_MAX in bash-4.2/jobs.c), it looks like in their effort to optimize their tracking of background jobs, they leave themselves vulnerable to PID aliasing (where a new process might obscure the status of an old one); to mitigate that, they prune their background process history (apparently as mandated by POSIX?). If you should happen to want to wait on a pruned process, the shell can't find it in the history and assumes this to mean that it never knew about it (i.e., that it "is not a child of this shell").

jhfrontz
  • 1,165
  • 4
  • 19
  • 31
  • I think I'm running into this issue. Do you know if it will be fixed at all? – Clete2 Apr 25 '22 at 11:18
  • @Clete2 it's apparently been this way for 8+ years and the [exacerbating] behavior is seemingly at least partially mandated by POSIX compliance. I wouldn't expect it to change anytime soon. – jhfrontz Apr 27 '22 at 21:14