I am evaluating the Polly library in terms of features and flexibility, and as part of the evaluation process I am trying to combine the WaitAndRetryPolicy
with the BulkheadPolicy
policies, to achieve a combination of resiliency and throttling. The problem is that the resulting behavior of this combination does not match my expectations and preferences. What I would like is to prioritize the retrying of failed operations over executing fresh/unprocessed operations.
The rationale is that (from my experience) a failed operation has greater chances of failing again. So if all failed operations get pushed to the end of the whole process, that last part of the whole process will be painfully slow and unproductive. Not only because these operations may fail again, but also because of the required delay between each retry, that may need to be progressively longer after each failed attempt. So what I want is that each time the BulkheadPolicy
has room for starting a new operation, to choose a retry operation if there is one in its queue.
Here is an example that demonstrates the undesirable behavior I would like to fix. 10 items need to be processed. All fail on their first attempt and succeed on their second attempt, resulting to a total of 20 executions. The waiting period before retrying an item is one second. Only 2 operations should be active at any moment:
var policy = Policy.WrapAsync
(
Policy
.Handle<HttpRequestException>()
.WaitAndRetryAsync(retryCount: 1, _ => TimeSpan.FromSeconds(1)),
Policy.BulkheadAsync(
maxParallelization: 2, maxQueuingActions: Int32.MaxValue)
);
var tasks = new List<Task>();
foreach (var item in Enumerable.Range(1, 10))
{
int attempt = 0;
tasks.Add(policy.ExecuteAsync(async () =>
{
attempt++;
Console.WriteLine($"{DateTime.Now:HH:mm:ss} Starting #{item}/{attempt}");
await Task.Delay(1000);
if (attempt == 1) throw new HttpRequestException();
}));
}
await Task.WhenAll(tasks);
Output (actual):
09:07:12 Starting #1/1
09:07:12 Starting #2/1
09:07:13 Starting #3/1
09:07:13 Starting #4/1
09:07:14 Starting #5/1
09:07:14 Starting #6/1
09:07:15 Starting #8/1
09:07:15 Starting #7/1
09:07:16 Starting #10/1
09:07:16 Starting #9/1
09:07:17 Starting #2/2
09:07:17 Starting #1/2
09:07:18 Starting #4/2
09:07:18 Starting #3/2
09:07:19 Starting #5/2
09:07:19 Starting #6/2
09:07:20 Starting #7/2
09:07:20 Starting #8/2
09:07:21 Starting #10/2
09:07:21 Starting #9/2
The expected output should be something like this (I wrote it by hand):
09:07:12 Starting #1/1
09:07:12 Starting #2/1
09:07:13 Starting #3/1
09:07:13 Starting #4/1
09:07:14 Starting #1/2
09:07:14 Starting #2/2
09:07:15 Starting #3/2
09:07:15 Starting #4/2
09:07:16 Starting #5/1
09:07:16 Starting #6/1
09:07:17 Starting #7/1
09:07:17 Starting #8/1
09:07:18 Starting #5/2
09:07:18 Starting #6/2
09:07:19 Starting #7/2
09:07:19 Starting #8/2
09:07:20 Starting #9/1
09:07:20 Starting #10/1
09:07:22 Starting #9/2
09:07:22 Starting #10/2
For example at the 09:07:14 mark the 1-second wait period of the failed item #1 has been expired, so its second attempt should be prioritized over doing the first attempt of the item #5.
An unsuccessful attempt to solve this problem is to reverse the order of the two policies. Unfortunately putting the BulkheadPolicy
before the WaitAndRetryPolicy
results to reduced parallelization. What happens is that the BulkheadPolicy
considers all retries of an item to be a singe operation, and so the "wait" phase between two retries counts towards the parallelization limit. Obviously I don't want that. The documentation also makes it clear the the order of the two policies in my example is correct:
BulkheadPolicy
: Usually innermost unless wraps a finalTimeoutPolicy
. Certainly inside anyWaitAndRetry
. TheBulkhead
intentionally limits the parallelization. You want that parallelization devoted to running the delegate, not occupied by waits for a retry.
Is there any way to achieve the behavior I want, while staying in the realm of the Polly library?