WaitAndRetryPolicy combined with BulkheadPolicy, prioritizing retries. Is it possible?

Question

I am evaluating the Polly library in terms of features and flexibility, and as part of the evaluation process I am trying to combine the WaitAndRetryPolicy with the BulkheadPolicy policies, to achieve a combination of resiliency and throttling. The problem is that the resulting behavior of this combination does not match my expectations and preferences. What I would like is to prioritize the retrying of failed operations over executing fresh/unprocessed operations.

The rationale is that (from my experience) a failed operation has greater chances of failing again. So if all failed operations get pushed to the end of the whole process, that last part of the whole process will be painfully slow and unproductive. Not only because these operations may fail again, but also because of the required delay between each retry, that may need to be progressively longer after each failed attempt. So what I want is that each time the BulkheadPolicy has room for starting a new operation, to choose a retry operation if there is one in its queue.

Here is an example that demonstrates the undesirable behavior I would like to fix. 10 items need to be processed. All fail on their first attempt and succeed on their second attempt, resulting to a total of 20 executions. The waiting period before retrying an item is one second. Only 2 operations should be active at any moment:

var policy = Policy.WrapAsync
(
    Policy
        .Handle<HttpRequestException>()
        .WaitAndRetryAsync(retryCount: 1, _ => TimeSpan.FromSeconds(1)),

    Policy.BulkheadAsync(
        maxParallelization: 2, maxQueuingActions: Int32.MaxValue)
);

var tasks = new List<Task>();
foreach (var item in Enumerable.Range(1, 10))
{
    int attempt = 0;
    tasks.Add(policy.ExecuteAsync(async () =>
    {
        attempt++;
        Console.WriteLine($"{DateTime.Now:HH:mm:ss} Starting #{item}/{attempt}");
        await Task.Delay(1000);
        if (attempt == 1) throw new HttpRequestException();
    }));
}
await Task.WhenAll(tasks);

Output (actual):

09:07:12 Starting #1/1
09:07:12 Starting #2/1
09:07:13 Starting #3/1
09:07:13 Starting #4/1
09:07:14 Starting #5/1
09:07:14 Starting #6/1
09:07:15 Starting #8/1
09:07:15 Starting #7/1
09:07:16 Starting #10/1
09:07:16 Starting #9/1
09:07:17 Starting #2/2
09:07:17 Starting #1/2
09:07:18 Starting #4/2
09:07:18 Starting #3/2
09:07:19 Starting #5/2
09:07:19 Starting #6/2
09:07:20 Starting #7/2
09:07:20 Starting #8/2
09:07:21 Starting #10/2
09:07:21 Starting #9/2

The expected output should be something like this (I wrote it by hand):

09:07:12 Starting #1/1
09:07:12 Starting #2/1
09:07:13 Starting #3/1
09:07:13 Starting #4/1
09:07:14 Starting #1/2
09:07:14 Starting #2/2
09:07:15 Starting #3/2
09:07:15 Starting #4/2
09:07:16 Starting #5/1
09:07:16 Starting #6/1
09:07:17 Starting #7/1
09:07:17 Starting #8/1
09:07:18 Starting #5/2
09:07:18 Starting #6/2
09:07:19 Starting #7/2
09:07:19 Starting #8/2
09:07:20 Starting #9/1
09:07:20 Starting #10/1
09:07:22 Starting #9/2
09:07:22 Starting #10/2

For example at the 09:07:14 mark the 1-second wait period of the failed item #1 has been expired, so its second attempt should be prioritized over doing the first attempt of the item #5.

An unsuccessful attempt to solve this problem is to reverse the order of the two policies. Unfortunately putting the BulkheadPolicy before the WaitAndRetryPolicy results to reduced parallelization. What happens is that the BulkheadPolicy considers all retries of an item to be a singe operation, and so the "wait" phase between two retries counts towards the parallelization limit. Obviously I don't want that. The documentation also makes it clear the the order of the two policies in my example is correct:

BulkheadPolicy: Usually innermost unless wraps a final TimeoutPolicy. Certainly inside any WaitAndRetry. The Bulkhead intentionally limits the parallelization. You want that parallelization devoted to running the delegate, not occupied by waits for a retry.

Is there any way to achieve the behavior I want, while staying in the realm of the Polly library?

Obviously you cant do this out of the box. I would just download the source and make your own overload and then do a pull request, or request the feature on github — TheGeneral, Oct 30 '20 at 07:55
@TheGeneral yes, this is the slow, painful and laborious way to solve the problem. I am hoping that some experienced user of the Polly library knows some easy way to solve it. — Theodor Zoulias, Oct 30 '20 at 08:01
I use it every day, however I cant see a way to do this in a straight forward way. Maybe someone else can think of an out of the square solution. good luck — TheGeneral, Oct 30 '20 at 08:06
@TheGeneral thanks! In case you come up with any idea, please share it in the comments. :-) — Theodor Zoulias, Oct 30 '20 at 08:12

score 3 · Accepted Answer · answered Nov 01 '20 at 10:38

I found a simple but not perfect solution to this problem. The solution is to include a second BulkheadPolicy positioned before the WaitAndRetryPolicy (in an "outer" position). This extra Bulkhead will serve only for reprioritizing the workload (by serving as an outer queue), and should have a substantially larger capacity (x10 or more) than the inner Bulkhead that controls the parallelization. The reason is that the outer Bulkhead could also affect (reduce) the parallelization in an unpredictable way, and we don't want that. This is why I consider this solution imperfect, because neither the prioritization is optimal, nor it is guaranteed that the parallelization will not be affected.

Here is the combined policy of the original example, enhanced with an outer BulkheadPolicy. Its capacity is only 2.5 times larger, which is suitable for this contrived example, but too small for the general case:

var policy = Policy.WrapAsync
(
    Policy.BulkheadAsync( // For improving prioritization
        maxParallelization: 5, maxQueuingActions: Int32.MaxValue),

    Policy
        .Handle<HttpRequestException>()
        .WaitAndRetryAsync(retryCount: 1, _ => TimeSpan.FromSeconds(1)),

    Policy.BulkheadAsync( // For controlling paralellization
        maxParallelization: 2, maxQueuingActions: Int32.MaxValue)
);

And here is the output of the execution:

12:36:02 Starting #1/1
12:36:02 Starting #2/1
12:36:03 Starting #3/1
12:36:03 Starting #4/1
12:36:04 Starting #2/2
12:36:04 Starting #5/1
12:36:05 Starting #1/2
12:36:05 Starting #3/2
12:36:06 Starting #6/1
12:36:06 Starting #4/2
12:36:07 Starting #8/1
12:36:07 Starting #5/2
12:36:08 Starting #9/1
12:36:08 Starting #7/1
12:36:09 Starting #10/1
12:36:09 Starting #6/2
12:36:10 Starting #7/2
12:36:10 Starting #8/2
12:36:11 Starting #9/2
12:36:11 Starting #10/2

Although this solution is not perfect, I believe that it should do more good than harm in the general case, and should result in a better performance overall.

WaitAndRetryPolicy combined with BulkheadPolicy, prioritizing retries. Is it possible?

1 Answers1

Linked