73

How to retry a bash command until its status is ok or until a timeout is reached?

My best shot (I'm looking for something simpler):

NEXT_WAIT_TIME=0
COMMAND_STATUS=1
until [ $COMMAND_STATUS -eq 0 || $NEXT_WAIT_TIME -eq 4 ]; do
  command
  COMMAND_STATUS=$?
  sleep $NEXT_WAIT_TIME
  let NEXT_WAIT_TIME=NEXT_WAIT_TIME+1
done
Philippe Blayo
  • 10,610
  • 14
  • 48
  • 65

7 Answers7

82

You can simplify things a bit by putting command right in the test and doing increments a bit differently. Otherwise the script looks fine:

NEXT_WAIT_TIME=0
until [ $NEXT_WAIT_TIME -eq 5 ] || command; do
    sleep $(( NEXT_WAIT_TIME++ ))
done
[ $NEXT_WAIT_TIME -lt 5 ]
carlin.scott
  • 6,214
  • 3
  • 30
  • 35
Grisha Levit
  • 8,194
  • 2
  • 38
  • 53
  • 4
    At least in bash version 4.1.5 you need to change the sleep line to sleep $(( NEXT_WAIT_TIME++ )) – Nightscape Jan 07 '13 at 15:18
  • 6
    Good solution, only problem is after the last "command" fail, you'll still have to sleep for 4 seconds. Not sure if that can be avoided and keep the code this compact. – David Feb 20 '14 at 04:52
  • 2
    @David Really? Will the `||` operator evaluate the right-hand side if the left-hand side returns a truthy value? (I don't know much bash, I just know that in javascript it wouldn't.) – Adrian Schmidt Nov 14 '18 at 14:49
  • 1
    @David on the last iteration the RHS of the `||` will be true so the `sleep` will not happen – Grisha Levit Aug 26 '19 at 01:22
  • @carlin.scott I rolled back you edit because it results in an exit status of 1 if the final execution of the command is successful. – Grisha Levit Apr 05 '20 at 03:40
  • If you _do_ want to preserve the exit status, you can use a function like: `run() { local e t=0; until cmd; do e=$?; ((++t > 4)) && return $e; sleep $t; done; }` – Grisha Levit Apr 05 '20 at 04:07
  • @GrishaLevit I agree that neither situation is desirable. But I think people would prefer the possibility of a non-zero exit code upon failure. I think your suggestion in your last comment could work but it's hard to follow as a one-line statement with single character variables. – carlin.scott Apr 06 '20 at 05:16
  • What is the purpose of placing the accumulate function after sleep? Wouldn't this mean that the sleep duration increases each time? – openCivilisation May 09 '22 at 00:03
  • 1
    @openCivilisation that is indeed the effect achieved. That seems to be what the question was asking for but I've sometimes wondered why this answer received so many votes, since it does not seem like functionality that would frequently be desired. – Grisha Levit May 09 '22 at 02:28
  • I think it would be worthy to suggest underneath what you think the OP might benefit from. – openCivilisation May 09 '22 at 07:58
56

One line and shortest, and maybe the best approach:

timeout 12h bash -c 'until ssh root@mynewvm; do sleep 10; done'

Credited by http://jeromebelleman.gitlab.io/posts/devops/until/

petertc
  • 3,607
  • 1
  • 31
  • 36
  • 3
    I think for most purposes this is the best answer. If you need more unwieldy code you can use functions. Yet it keeps the main thing your trying to achieve short enough to be easy understood. – SlideM May 08 '20 at 07:55
  • Note that `timeout` is a GNU extension, part of GNU `coreutils`, so it's in essence unavailable (by default) on BSD/macOS-based systems. For a POSIX-compliant approximation of this, see https://unix.stackexchange.com/questions/274564/posix-equivalent-for-gnu-timeout – Per Lundberg Mar 09 '22 at 09:11
  • 1
    Amazing, worked wonderfully for my usecase! Had to check until a healthcheck passed in a container running in docker-compose, ended up being `timeout 10s bash -c "until docker-compose ps $CONTAINER_NAME | grep \(healthy\); do sleep 1; done"` – Giora Guttsait Jul 13 '22 at 11:35
18

retry function

This script can be downloaded from retrying-commands-in-shell-scripts

#!/bin/bash
     
# Retries a command on failure.
# $1 - the max number of attempts
# $2... - the command to run
retry() {
   local -r -i max_attempts="$1"; shift
   local -r cmd="$@"
   local -i attempt_num=1
  
   until $cmd
      do
         if (( attempt_num == max_attempts ))
         then
            echo "Attempt $attempt_num failed and there are no more attempts left!"
            return 1
         else
            echo "Attempt $attempt_num failed! Trying again in $attempt_num seconds..."
            sleep $(( attempt_num++ ))
         fi
   done
}
  • example usage:

    retry 5 ls -ltr foo
    
  • if you want to retry an function in your script, use it like shown below:

   # example usage:
   foo()
   {
      #whatever you want do.
   }

   declare -fxr foo
   retry 3 timeout 60 bash -ce 'foo'
abu_bua
  • 1,361
  • 17
  • 25
rhinoceros.xn
  • 803
  • 9
  • 12
14

Put together some tools.

retry: https://github.com/kadwanev/retry

timeout: http://manpages.courier-mta.org/htmlman1/timeout.1.html

Then see the magic

retry timeout 3 ping google.com

PING google.com (173.194.123.97): 56 data bytes
64 bytes from 173.194.123.97: icmp_seq=0 ttl=55 time=13.982 ms
64 bytes from 173.194.123.97: icmp_seq=1 ttl=55 time=44.857 ms
64 bytes from 173.194.123.97: icmp_seq=2 ttl=55 time=64.187 ms
Before retry #1: sleeping 0.3 seconds
PING google.com (173.194.123.103): 56 data bytes
64 bytes from 173.194.123.103: icmp_seq=0 ttl=55 time=56.549 ms
64 bytes from 173.194.123.103: icmp_seq=1 ttl=55 time=60.220 ms
64 bytes from 173.194.123.103: icmp_seq=2 ttl=55 time=8.872 ms
Before retry #2: sleeping 0.6 seconds
PING google.com (173.194.123.103): 56 data bytes
64 bytes from 173.194.123.103: icmp_seq=0 ttl=55 time=25.819 ms
64 bytes from 173.194.123.103: icmp_seq=1 ttl=55 time=16.382 ms
64 bytes from 173.194.123.103: icmp_seq=2 ttl=55 time=3.224 ms
Before retry #3: sleeping 1.2 seconds
PING google.com (173.194.123.103): 56 data bytes
64 bytes from 173.194.123.103: icmp_seq=0 ttl=55 time=58.438 ms
64 bytes from 173.194.123.103: icmp_seq=1 ttl=55 time=94.828 ms
64 bytes from 173.194.123.103: icmp_seq=2 ttl=55 time=61.075 ms
Before retry #4: sleeping 2.4 seconds
PING google.com (173.194.123.103): 56 data bytes
64 bytes from 173.194.123.103: icmp_seq=0 ttl=55 time=43.361 ms
64 bytes from 173.194.123.103: icmp_seq=1 ttl=55 time=32.171 ms
...

Check exit status for ultimate pass/fail.

nkadwa
  • 839
  • 8
  • 16
1

For anyone wanting to actually wait until some time passed, taking into account the time of your command might be significant:

TIMEOUT_SEC=180
start_time="$(date -u +%s)"
while [ condition_or_just_true ]; do
  current_time="$(date -u +%s)"
  elapsed_seconds=$(($current_time-$start_time))
  if [ $elapsed_seconds -gt $TIMEOUT_SEC ]; then
    echo "timeout of $TIMEOUT_SEC sec"
    exit 1
  fi
  echo "another attempt (elapsed $elapsed_seconds sec)"
  some_command_and_maybe_sleep
done
Mugen
  • 8,301
  • 10
  • 62
  • 140
0

I found this to do what I was looking for:

function wait_for_success() {
    local timeout start_time end_time
    timeout=${TIMEOUT:-60}
    interval=${INTERVAL:-2}
    start_time=$(date +%s)
    end_time=$((start_time + timeout))
    while [ $(date +%s) -lt $end_time ]; do
        if $@; then
            return 0
        fi
        sleep $interval
    done
    >&2 echo "Timeout exceeded."
    return 1
}
Niklas R
  • 16,299
  • 28
  • 108
  • 203
-1

I made some tweaks to this answer which let you switch on whether the timeout was reached, or whether the command succeed. Also, in this version there is a retry every second:

ELAPSED=0
started=$(mktemp)
echo "False" > $started
until the_command_here && echo "True" > $started || [ $ELAPSED -eq 30 ]
do
   sleep 1
   (( ELAPSED++ ))
done

if [[ $(cat $started) == "True" ]]                                                                                                                                                                                                                            
then                                                                                                                    
    echo "the command completed after $ELAPSED seconds"                                                                                              
else                                                                                                                    
    echo "timed out after $ELAPSED seconds"                                                                               
    exit 111                                                                                                            
fi
MatrixManAtYrService
  • 8,023
  • 1
  • 50
  • 61