1

Say I have this pseudocode in bash

#!/bin/bash

things    
for i in {1..3}
do
    nohup someScript[i] &

done
wait

for i in {4..6}
do
    nohup someScript[i] &

done
wait
otherThings

and say this someScript[i] sometimes end up hanging.

Is there a way I can take the process IDs (with $!) and check periodically if the process is taking more than a specified amount of time after which I want to kill the hanged processes with kill -9 ?

Marco Pietrosanto
  • 420
  • 1
  • 7
  • 18
  • you can create a watchdog for each process that you fire. to implement the watchdog processes you can run your main processes by using `time -o run.${!}.time nohup someScript[i]` and run them as background processes after you launched the main processes. the watchdog should evaluate the statistics stored by the `time` command. another probably easier way is to store timestamps when you started your script processes in an array as well to check each mapping of background process array and (current) timestamps array in bacground processes as well launched just before the wait command. – Marc Bredt Jun 24 '15 at 12:51

2 Answers2

1

Unfortunately the answer from @Eugeniu did not work for me, timeout gave an error.

However I found useful doing this routine, I'll post it here so anyone can take advantage of it if in my same problem.

Create another script which goes like this

#!/bin/bash
#monitor.sh

pid=$1

counter=10
while ps -p $pid > /dev/null
do
    if [[ $counter -eq 0 ]] ; then
            kill -9 $pid
    #if it's still there then kill it
    fi
    counter=$((counter-1))
    sleep 1
done

then in the main work you just put

things    
for i in {1..3}
do
    nohup someScript[i] &
    ./monitor.sh $! &
done
wait

In this way for any of your someScript you will have a parallel process that checks if it's still there every chosen interval (until maximum time decided by the counter) and that actually quit itself if the associated process dies (or gets killed)

Marco Pietrosanto
  • 420
  • 1
  • 7
  • 18
  • So, in `monitor.sh` you let timeout to be 10s for every job started with `nohup`? – Eugeniu Rosca Jun 25 '15 at 13:01
  • You are able to control it, the maximum time allowed for each process would be * while the process is checked every seconds. – Marco Pietrosanto Jun 27 '15 at 09:10
  • I don't catch the difference between the while looping in your `monitor.sh` and these 3 commands: `pid=$1; sleep ; kill -9 $pid` – Eugeniu Rosca Jun 27 '15 at 09:14
  • Not sure if I understand your doubt, but I'll try to be clearer: The while condition ps -p $pid is true only if the process is alive. So anytime the process ends by itself this condition is no more true and the monitor.sh ends as well. If the process hangs, the while loop continues until * seconds, when the counter reaches 0 and the process gets killed by the monitor (exiting the monitor itself because the while condition is now false) – Marco Pietrosanto Jun 27 '15 at 12:57
0

One possible approach:

#!/bin/bash

# things
mypids=()
for i in {1..3}; do
    # launch the script with timeout (3600s)
    timeout 3600 nohup someScript[i] &
    mypids[i]=$! # store the PID
done

wait "${mypids[@]}"
Eugeniu Rosca
  • 5,177
  • 16
  • 45
  • I actually need that "wait" command to avoid filling up the server capacity. Every process takes one processor out of 8, that's why I'm waiting for the 3 processes to end before firing the others. In this way, how would you integrate the wait command? Sorry I'm seeing now that the wait command was in the wrong place. – Marco Pietrosanto Jun 24 '15 at 12:32
  • @MarcoPietrosanto Check all the pids from the first processes first then continue the script when they are all ended/killed. – 123 Jun 24 '15 at 12:34
  • @MarcoPietrosanto: `wait` will simply block the execution of your script, until all processes are finished. So, you want first sleeping a certain amount of time, then checking if any process is still alive, then killing those alive processes, then moving on. – Eugeniu Rosca Jun 24 '15 at 12:37
  • They usually end in some minutes (variable), and wait catches that in a proper way. Sleeping for fixed amounts of time before going on "whatever happens" is not what I need. I need something that: - if the processes go as they have to, just goes along without waiting fixed amounts of time - if the processes hangs (goes over an hour or something like that) then it kills that. – Marco Pietrosanto Jun 24 '15 at 12:39
  • @Marco Pietrosanto: check the updated answer. Also inspire from the answers posted [here](http://stackoverflow.com/questions/10028820/bash-wait-with-timeout). – Eugeniu Rosca Jun 24 '15 at 12:55
  • Timeout gives me error. I actually solved the problem in another way (posted as an answer), but thanks for the suggestions! – Marco Pietrosanto Jun 25 '15 at 12:51