What is the correct way to implement a Retry + Timeout + CircuitBreaker resilience strategy with Polly .Net?

Question

I'm trying to implement a custom resilience policy for an API, based on these Polly .NET Policies: Retry + Timeout + Circuit Breaker.

A. To keep in mind:

I don't want to use Policy Wraps. This is not necessary at this point. I just want to use a nested policy. Later I will consider wrapping.
I need to do something simple, just for testing / trying the real benefits of this strategy.
I don't want to use Fallback. I want to keep the server response intact.

B. What I pretend?

Have a Retry Policy that retries three times in a row with a N seconds between retries.
Have a Timeout Policy that fails quickly if HttpClient spends more than M seconds to respond.
Have a Circuit Breaker that broken the circuit to avoid "send" requests if N+1 retries have failed in a row.

C. What I've coded?

Declaring some variables:

const int NUMBER_OF_RETRIES_PER_REQUEST = 3; //TIME_IN_SECONDS
const int MAX_TIME_FOR_HTTP_REQUEST = 10; //TIME_IN_SECONDS
const int TIME_TO_WAIT_BETWEEN_RETRIES = 100; //TIME_IN_SECONDS
static HttpClient httpClient = new HttpClient()
{ 
    Timeout = new TimeSpan(0, 0, MAXSecondsHTTPTimeout) 
};

Defining Timeout Policy:

var timeoutPollyPolicy = Policy.TimeoutAsync(MAX_TIME_FOR_HTTP_REQUEST / 2);

Defining Retry Policy:

var retryPollyPolicy = Policy.HandleResult<HttpResponseMessage>(
                response => { return STATUS_TO_HANDLE.Contains(response.StatusCode); })
                .WaitAndRetryAsync(
                    NUMBER_OF_RETRIES_PER_REQUEST,
                    (retryAttempt) => TimeSpan.FromSeconds(TIME_TO_WAIT_BETWEEN_RETRIES),
                    (DelegateResult<HttpResponseMessage> lastResponse,
                        TimeSpan waitTime, int retryCount, Context context) =>
                    {
                        //ACTIONS TO BE EXECUTED IN EACH RETRY
                    }
                );

Executing policies

 public async Task<HttpResponseMessage> 
    Send(HttpRequestMessage req, CancellationToken tokenToStop)
{
    try
    {
        //OBS. Timeout and Retry Policies are defined here just for this example
        //I'm defining these policies globally for all HttpClients

        var httpResponse = await retryPollyPolicy.ExecuteAsync(() =>
            timeoutPollyPolicy.ExecuteAsync(async (CancellationToken cancellationToken) =>
            {
                //cancellationToken is innherent to TimeoutAsync 
                //cancelling the inner request for timeoutPollyPolicy
                return = await httpClient.SendAsync(req, cancellationToken);
                
                //tokenToStop is received to stop / cancel the external request for retryPolicy
            },tokenToStop));
        
        return httpResponse;
    }
    catch (Exception ex)
    {
        //HANDLE_EXCEPTION, LOG. Not throw necessarily.
        throw ex;
    }
}

D. What I'm trying to define / to understand?

What must be the innermost policy in this method?

What is the difference in the CircuitBreaker behavior if I write the CircuitBReaker inside/outside the TimeoutPolicy?

Do I need to add a CancellationToken to the CircuitBreaker Policy?
Aside of readability, what other resilience benefits do I have if I use Wraps?

The ordering defines how the policy escalation chain builds up. For example if the Timeout is the most inner than each retry attempt has a separate timeout. If Timeout is the most outer than you have a single global timeout which overarches all retry attempts. So, it depends what do you want to achieve. What do you want to have at the end? — Peter Csala, Oct 10 '21 at 20:46
I would like to encourage you to please check [my sample application](https://github.com/peter-csala/resilience-service-design/blob/main/ResilienceServiceDesignDemo/ResilienceServiceDesignDemo/Program.cs), which demonstrates how to chain Retry, Circuit Breaker and Timeout policies. With this ordering each and every retry attempt has a separate timeout. After two successive failed attempt the CB will open. And depending on the retry count it may (or not) issue new attempts after the CB is open. — Peter Csala, Oct 11 '21 at 07:44
It seems like there are a couple of confusions / uncertain areas regarding the policies, so I would also suggest to please read [this article](https://github.com/peter-csala/resilience-service-design/blob/main/resilience.md) about the different techniques. — Peter Csala, Oct 11 '21 at 07:46
@PeterCsala why are you allowing retry on retry policy when the exception type is a BrokenCircuitException "Or()", this should able to retry does not matter if circuit-breaker is open, it's not? — diegobarriosdev, Oct 11 '21 at 22:08
We are defining resilient strategies to overcome on the transient failures. If the downstream system is overloaded we want to give time to it to recover, so we won't issue any new request against it (CB). But after that grace period we **hope** that the downstream system has normalized so we can retry our previously failed requests. During this grace period the CB will throw `BrokenCircuitException`, but after that period it transit into a Half-Open state which allows us to issue a request without shortcutting that. If that fails the CB will be Open, if succeeds >> Closed. — Peter Csala, Oct 12 '21 at 05:57
Here you can find some of my previous SO answers where I further elaborate this reasoning: [1](https://stackoverflow.com/questions/68333247/cannot-implicitly-convert-type-polly-circuitbreaker-asynccircuitbreaker-to-po/68336417#68336417), [2](https://stackoverflow.com/questions/62698046/waitretryforever-is-not-working-for-a-customexception-in-polly-net-resiliency/62711484#62711484), [3](https://stackoverflow.com/questions/64984030/polly-circuit-breaker-handled-and-unhandled-exceptions/65001685#65001685) — Peter Csala, Oct 12 '21 at 06:01

What is the correct way to implement a Retry + Timeout + CircuitBreaker resilience strategy with Polly .Net?

0 Answers0