0

I have an Azure Storage account in WEST US with Geo-Replication enabled to sync with EAST US and I want to perform the failover using the bash script on demand basis.

I have defined the following function

_STORAGE_ACCOUNT_FAILOVER () {
    echo "Storage account failover is initiated.."

    az storage account failover --name $STORAGEACCOUNT --no-wait --yes

    echo "Storage account failover is completed successfully.."
}

while trying to execute the above function, I got the following error

ERROR: (ResourceCollectionRequestsThrottled) Operation 'Microsoft.Storage/storageAccounts/read' failed as server encountered too many requests. 
Please try after '17' seconds. Tracking Id is ''.

I want to implement the retry logic incase of any issues? how do I implement the retry logic?

something like

    _STORAGE_ACCOUNT_FAILOVER () {
        echo "Storage account failover is initiated.."

performFailover:
        az storage account failover --name $STORAGEACCOUNT --no-wait --yes

if [ "$?" -ne 0 ]; then
    goto performFailover;
fi
    
        echo "Storage account failover is completed successfully.."
    }

or something like this

_STORAGE_ACCOUNT_FAILOVER () {
    echo "Storage account failover is initiated.."

    while true; do
      az storage account failover --name $STORAGEACCOUNT --no-wait --yes

      if [ "$?" -eq 0 ]; then
        break;
      fi

      sleep 30s
    done

    echo "Storage account failover is completed successfully.."
}
One Developer
  • 99
  • 5
  • 43
  • 103
  • See also [Why is testing ”$?” to see if a command succeeded or not, an anti-pattern?](https://stackoverflow.com/questions/36313216/why-is-testing-to-see-if-a-command-succeeded-or-not-an-anti-pattern) – tripleee Feb 24 '21 at 10:58

1 Answers1

1

The human-readable error message is somewhat problematic; can you get the requested retry wait in machine-readable form instead?

Here's a quick sketch, assuming that az sets a nonzero exit code on timeout. (If you are lucky, it will have a unique exit code for this specific error.)

_STORAGE_ACCOUNT_FAILOVER () {
    echo "$0: Storage account failover initiated" >&2
    while true; do
        if result=$(az storage account failover \
            --name "$STORAGEACCOUNT" \
            --no-wait --yes 2>&1 >/dev/null)
        then
            break
        else
            rc=$?
            case $result in
             *"Please try after '"*)
                seconds=${result#*Please try after \'}
                seconds=${seconds%%\'*}
                wait "$seconds"
                ;;
             *) echo "$0: $result" >&2
                exit $rc
                ;;
            esac
        fi
    done
    echo "$0: Storage account failover completed successfully" >&2
}

Notice how we capture standard error from the az command and parse out the timeout message, or return an error if we can't. Notice also how all diagnostic messages are printed to standard error, and formatted telegraphically, yet always with the script's name in the message so you can tell which script failed if you have scripts calling scripts calling scripts etc.

6ugr3
  • 411
  • 4
  • 7
tripleee
  • 175,061
  • 34
  • 275
  • 318