9

How to implement exponential backoff in Azure Functions?

I have a function that depends on external API. I would like to handle the unavailability of this service using the retry policy. This function is triggered when a new message appears in the queue and in this case, this policy is turned on by default:

For most triggers, there is no built-in retry when errors occur during function execution. The two triggers that have retry support are Azure Queue storage and Azure Blob storage. By default, these triggers are retried up to five times. After the fifth retry, both triggers write a message to a special poison queue.

Unfortunately, the retry starts immediately after the exception (TimeSpan.Zero), and this is pointless in this case, because the service is most likely still unavailable. Is there a way to dynamically modify the time the message is again available in the queue?

I know that I can set visibilityTimeout (host.json reference), but it's set for all queues and that is not what I want to achieve here.

I found one workaround, but it is far from ideal solution. In case of exception, we can add the message again to the queue and set visibilityTimeout for this message:

[FunctionName("Test")]
public static async Task Run([QueueTrigger("queue-test")]string myQueueItem, TraceWriter log,
    ExecutionContext context, [Queue("queue-test")] CloudQueue outputQueue)
{
    if (true)
    {
        log.Error("Error message");
        await outputQueue.AddMessageAsync(new CloudQueueMessage(myQueueItem), TimeSpan.FromDays(7),
            TimeSpan.FromMinutes(1), // <-- visibilityTimeout
            null, null).ConfigureAwait(false);
        return;
    }
}

Unfortunately, this solution is weak because it does not have a context (I do not know which attempt it is and for this reason I can not limit the number of calls and modify the time (exponential backoff)).

Internal retry policy also is not welcome, because it can drastically increase costs (pricing models).

Pawel Maga
  • 5,428
  • 3
  • 38
  • 62
  • 2
    It looks like you did your homework, so you know the options. I don't think there is any other solution except re-sending the queue message again. Depending on your scenario, you could add some metadata to the message, but you have to implement that yourself. – Mikhail Shilkov Jun 12 '18 at 13:09
  • @Mikhail I thought about this solution, but it seems a bit tricky to me. I would like to avoid expanding this object, but if there is no other solution, I will do it. – Pawel Maga Jun 12 '18 at 13:14
  • I'd recommend opening an issue here https://github.com/Azure/azure-functions-host/issues since it's likely a feature request. also Mathew would have a better idea if there is another workaround. – ahmelsayed Jun 12 '18 at 19:15

5 Answers5

4

Microsoft added retry policies around November 2020 (preview), which support exponential backoff:

[FunctionName("Test")]
[ExponentialBackoffRetry(5, "00:00:04", "00:15:00")] // retries with delays increasing from 4 seconds to 15 minutes
public static async Task Run([QueueTrigger("queue-test")]string myQueueItem, TraceWriter log, ExecutionContext context)
{
    // ...
}
Douglas
  • 53,759
  • 13
  • 140
  • 188
1

I had a similar problem and ended up using durable functions which have an automatic retry feature built-in. This can be used when you wrap your external API call into activity and when calling this activity you can configure retry behavior through the options object. You can set the following options:

Max number of attempts: The maximum number of retry attempts.

First retry interval: The amount of time to wait before the first retry attempt.

Backoff coefficient: The coefficient used to determine rate of increase of backoff. Defaults to 1.

Max retry interval: The maximum amount of time to wait in between retry attempts.

Retry timeout: The maximum amount of time to spend doing retries. The default behavior is to retry indefinitely.

Handle: A user-defined callback can be specified to determine whether a function should be retried.

Community
  • 1
  • 1
0

One option to consider is to have your Function invoke a Logic App that has a delay set to your desired amount of time and then after the delay invokes the function again. You could also add other retry logic (like # of attempts) to the Logic App using some persistent storage to tally your attempts. You would only invoke the Logic App if there was a connection issue.

Alternatively you could shift your process starting point to Logic Apps as it also can be triggered (think bound) queue messages. In either case Logic Apps adds the ability to pause and re-invoke the Function and/or process.

KWilson
  • 709
  • 5
  • 8
0

If you are explicitly completing/deadlettering messages ("autoComplete": false), here's an helper function that will exponentially delay and retry until the max delivery count is reached:

        public static async Task ExceptionHandler(IMessageSession MessageSession, string LockToken, int DeliveryCount)
        {
            if (DeliveryCount < Globals.MaxDeliveryCount)
            {
                var DelaySeconds = Math.Pow(Globals.ExponentialBackoff, DeliveryCount);
                await Task.Delay(TimeSpan.FromSeconds(DelaySeconds));
                await MessageSession.AbandonAsync(LockToken);
            }
            else
            {
                await MessageSession.DeadLetterAsync(LockToken);
            }
        }
Rachel
  • 686
  • 1
  • 6
  • 18
0

Since November 2022, there hasn't been anymore support for Function-level retries for QueueTrigger (source).

Instead of this, you must use Binding extensions:

{
    "version": "2.0",
    "extensions": {
        "serviceBus": {
            "clientRetryOptions":{
                "mode": "exponential",
                "tryTimeout": "00:01:00",
                "delay": "00:00:00.80",
                "maxDelay": "00:01:00",
                "maxRetries": 3
            }
        }
    }
}
Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77