1

WHAT DO I HAVE NOW?

Currently, I have a client configured with a RetryAsync policy that uses a primary address and on failure switches to a failover address. The connection details are read from a secrets manager.

services
    .AddHttpClient ("MyClient", client => client.BaseAddress = PlaceholderUri)
    .ConfigureHttpMessageHandlerBuilder (builder => {

        // loads settings from secret manager
        var settings = configLoader.LoadSettings().Result;

        builder.PrimaryHandler = new HttpClientHandler {
            Credentials = new NetworkCredential (settings.Username, settings.Password),
            AutomaticDecompression = DecompressionMethods.GZip
        };

        var primaryBaseAddress = new Uri (settings.Host);
        var failoverBaseAddress = new Uri (settings.DrHost);

        builder.AdditionalHandlers.Add (new PolicyHttpMessageHandler (requestMessage => {
            var relativeAddress = PlaceholderUri.MakeRelativeUri (requestMessage.RequestUri);
            requestMessage.RequestUri = new Uri (primaryBaseAddress, relativeAddress);

            return HttpPolicyExtensions.HandleTransientHttpError ()
                .RetryAsync ((result, retryCount) =>
                    requestMessage.RequestUri = new Uri (failoverBaseAddress, relativeAddress));
        }));
    });

WHAT AM I TRYING TO ACHIEVE?

In general

My client can use a primary or failover service. When the primary is down, use failover till the primary is back up. When both are down, we get alerted and can change the service addresses dynamically via secrets manager.

In code

Now I would like to introduce also a CircuitBreakerPolicy and chain those 2 policies together. I am looking for a configuration that is encapsulated and faults are handled on the client level and not on the class consuming that client.

Scenario explained

Let's assume that there is a circuit breaker policy wrapped in a retry policy with a single client.

The circuit breaker is configured to break the circuit for 60 seconds after 3 failed attempts on transient errors on the primary base address. OnBreak - the address changes from primary to failover.

The retry policy is configured to handle BrokenCircuitException, and retry once with the address changed from primary to failover to continue.

  1. Request on primary address - 500 code
  2. Request on primary address - 500 code
  3. Request on primary address - 500 code (3 consecutive failures reached)
  4. Circuit broken for 60 seconds
  5. Request on primary address - BrokenCircuitException caught by retry policy, call failover
  6. Request on primary address - BrokenCircuitException caught by retry policy, call failover
  7. Request on primary address - BrokenCircuitException caught by retry policy, call failover
  8. Request on primary address - BrokenCircuitException caught by retry policy, call failover
  9. (after 60 secs) Circuit half-open - (here can be broken for another 60 secs or is open - assume open)
  10. Request on primary address - 200 code

As described in this articles, there is a solution to this using a breaker wrapped in a fallback, but as you can see there, the logic for default and fallback are implemented in class and not on client level.

I would like

public class OpenExchangeRatesClient
{
    private readonly HttpClient _client;
    private readonly Policy _policy;
    public OpenExchangeRatesClient(string apiUrl)
    {
        _client = new HttpClient
        {
            BaseAddress = new Uri(apiUrl),
        };

        var circuitBreaker = Policy
            .Handle<Exception>()
            .CircuitBreakerAsync(
                exceptionsAllowedBeforeBreaking: 2,
                durationOfBreak: TimeSpan.FromMinutes(1)
            );

        _policy = Policy
            .Handle<Exception>()
            .FallbackAsync(() => GetFallbackRates())
            .Wrap(circuitBreaker);
    }

    public Task<ExchangeRates> GetLatestRates()
    {
        return _policy
            .ExecuteAsync(() => CallRatesApi());
    }

    public Task<ExchangeRates> CallRatesApi()
    {
        //call the API, parse the results
    }

    public Task<ExchangeRates> GetFallbackRates()
    {
        // load the rates from the embedded file and parse them
    }
}

to be rewritten as

public class OpenExchangeRatesClient 
{
    private readonly HttpClient _client;
    public OpenExchangeRatesClient (IHttpClientFactory clientFactory) {
        _client = clientFactory.CreateClient ("MyClient");
    }

    public Task<ExchangeRates> GetLatestRates () {
        return _client.GetAsync ("/rates-gbp-usd");
    }
}

WHAT HAVE I READ?

WHAT HAVE I TRIED?

I have tried few different scenarios to chain and combine circuit breaker policy with a retry policy to achieve the desired goal on a client lever in the Startup file. The last state was the below. The policies are wrapped in the order where retry would be able to catch a BrokenCircuitException, but this has not been the case. The Exception is thrown on the consumer class, which is not the desired result. Although RetryPolicy is triggered, the exception on the consumer class is still thrown.

var retryPolicy = GetRetryPolicy();
var circuitBreaker = GetCircuitBreakerPolicy();

var policyWraper = Policy.WrapAsync(retryPolicy, circuitBreaker);

services
    .AddHttpClient("TestClient", client => client.BaseAddress = GetPrimaryUri())
    .AddPolicyHandler(policyWraper);

static IAsyncPolicy<HttpResponseMessage> GetCircuitBreakerPolicy()
{
    return HttpPolicyExtensions
        .HandleTransientHttpError()
        .CircuitBreakerAsync(
            3,
            TimeSpan.FromSeconds(45),
            OnBreak,
            OnReset, 
            OnHalfOpen);
}

static IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
    return Policy<HttpResponseMessage>
        .Handle<Exception>()
        .RetryAsync(1, (response, retryCount) =>
        {
            Debug.WriteLine("Retries on broken circuit");
        });
}

I have left out the methods OnBreak, OnReset and OnHalfOpen since they are just printing some messages.

UPDATE: Added Logs from Console.

Circuit broken (after 3 attempts)
Retries on broken
Exception thrown: 'System.AggregateException' in System.Private.CoreLib.dll 
Retries on broken circuit
Exception thrown: 'System.AggregateException' in System.Private.CoreLib.dll

'CircuitBreakerPolicy.exe' (CoreCLR: clrhost): Loaded 'C:\Program Retries on broken circuit Exception thrown: 'System.AggregateException' in System.Private.CoreLib.dll

UPDATE 2: Added reference URL to the class making use of the client with policies configured

UPDATE 3: The project has been updated so that implementation of WeatherService2.Get works in the desired way: When primary service is unavailable the circuit is broken, falover service is used till circuit is closed. That would be the answer to this question, however I would like to explore a solution, where same outcome is achieved using WeatherService.Get with the appropriate policy and client setup on the Startup.

Reference to class using the client. Reference to project using the class.

On the above logs can be seen Exception thrown: 'System.AggregateException' in System.Private.CoreLib.dll which thrown by the circuitbreaker - that is not expected since there is retry wrapping the circuit breaker.

Vergil C.
  • 1,046
  • 2
  • 15
  • 28
  • 1
    Your `GetRetryPolicy` sets the retryCount to 1. That means there will be 2 attempts (the initial and the retry). So your Circuit Breaker won't break because there are no consecutive 3 failed attempts. – Peter Csala Mar 22 '21 at 07:23
  • The CircuitBreaker is breaking after 3 attempts, then the retry is triggered since BrokenCircutException is thrown. The problem is that exception is also thrown on the consuming class, I expect it to be handled by the retry policy which wraps the circuitbreaker policy. Modifying post to add logs. – Vergil C. Mar 22 '21 at 13:27
  • Can you share with us the consuming side as well? Can you also share with us what's inside the `AggregateException` (what is the `InnerException`)? – Peter Csala Mar 23 '21 at 16:04
  • 1
    Uploaded project to public repo and added references to the class using the client and also to the whole project. I also debugged by enabling breaking on all exceptions and `AggregateException` has inner exception `Polly.CircuitBreaker.BrokenCircuitException`. – Vergil C. Mar 24 '21 at 14:42
  • Does my second post's explanation give you clarity? – Peter Csala Apr 19 '21 at 06:30
  • 1
    I haven't had the chance yet to look at your second post, will try to do soon and answer. – Vergil C. May 25 '21 at 15:27
  • @VergilC. Hello sir. I checked question and answers. I haven't check your repo yet. So i wanted to ask if we can say the solution in your repo you shared works fine for this particular case? I am trying to achive same things in my project. A base adress and a failover adress for CB – jack Jan 17 '23 at 07:47

2 Answers2

1

I've downloaded your project and played with it, so here are my observations:

Blocking vs Non-blocking

  • Because your code uses blocking async call (.Result) that's why you see AggregateException
public IEnumerable<WeatherForecast> Get()
{
    HttpResponseMessage response = null;
    try
    {
        response = _client.GetAsync(string.Empty).Result; //AggregateException  
    }
    catch (Exception e)
    {
        Debug.WriteLine($"{e.Message}");
    }
    ...
}
  • In order to unwrap the InnerException of the AggregateException you need to use await
public async Task<IEnumerable<WeatherForecast>> Get()
{
    HttpResponseMessage response = null;
    try
    {
        response = await _client.GetAsync(string.Empty); //BrokenCircuitException
    }
    catch (Exception e)
    {
        Debug.WriteLine($"{e.Message}");
    }
    ...
}

Escalation

Whenever you wrap a policy into another then escalation might happen. That means if the inner can't handle the problem then it will propagate the same problem to the outer, which may or may not be able to handle it. If the outermost is not handling the problem then (most of the time) the original exception will be thrown to the consumer of the resilience strategy (which is a combination of policies).

Here you can find more details about escalation.

Let's review this concept in your case:

var policyWrapper = Policy.WrapAsync(retryPolicy, circuitBreaker);

static IAsyncPolicy<HttpResponseMessage> GetCircuitBreakerPolicy()
{
    return HttpPolicyExtensions
        .HandleTransientHttpError()
        .CircuitBreakerAsync(3, TimeSpan.FromSeconds(45), ...);
}

static IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
    return Policy<HttpResponseMessage>
        .Handle<Exception>()
        .RetryAsync(1, ...);
}
  1. Initial request (1. attempt) is issued against https://httpstat.us/500
  2. It returns 500 which will increase the consecutive transient failure from 0 to 1
  3. CB escalates the problem to retry
  4. Retry is not handling status 500, so retry is not triggered
  5. httpClient returns a HttpResponseMessage with InternalServerError status code.

Let's modify the retry policy to handle transient http errors as well:

static IAsyncPolicy<HttpResponseMessage> GetCircuitBreakerPolicy()
{
    return HttpPolicyExtensions
        .HandleTransientHttpError()
        .CircuitBreakerAsync(3, TimeSpan.FromSeconds(45), ...);
}

static IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
    return HttpPolicyExtensions
        .HandleTransientHttpError()
        .Or<Exception>()
        .RetryAsync(1, ...);
}
  1. Initial request (1. attempt) is issued against https://httpstat.us/500
  2. It returns 500 which will increase the consecutive transient failure from 0 to 1
  3. CB escalates the problem to retry
  4. Retry is handling status 500, so retry issues another attempt immediately
  5. 1st retry request (2. attempt) is issued against https://httpstat.us/500
  6. It returns 500 which will increase the consecutive transient failure from 1 to 2
  7. CB escalates the problem to retry
  8. Even though Retry is handling status 500 it will not trigger because it reached its retrycount (1)
  9. httpClient returns a HttpResponseMessage with InternalServerError StatusCode.

Now, let's lower the consecutive failure count from 3 to 1 and handle BrokenCircuitException explicitly:

static IAsyncPolicy<HttpResponseMessage> GetCircuitBreakerPolicy()
{
    return HttpPolicyExtensions
        .HandleTransientHttpError()
        .CircuitBreakerAsync(1, TimeSpan.FromSeconds(45), ...);
}

static IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
    return HttpPolicyExtensions
        .HandleTransientHttpError()
        .Or<BrokenCircuitException>()
        .RetryAsync(1, ...);
}
  1. Initial request (1. attempt) is issued against https://httpstat.us/500
  2. It returns 500 which will increase the consecutive transient failure from 0 to 1
  3. Circuit Breaker opens because it reaches the predefined threshold
  4. CB escalates the problem to retry
  5. Retry is handling status 500, so retry issues another attempt immediately
  6. 1st retry request (2. attempt) is issued against https://httpstat.us/500
  7. CB prevents this call because it is broken
  8. CB throws a BrokenCircuitException
  9. Even though Retry is handling BrokenCircuitException it will not trigger because it reached its retrycount (1)
  10. Retry throws the original exception (BrokenCircuitException) so httpClient's GetAsync will throw that one.

Finally let's increase the retryCount from 1 to 2:

static IAsyncPolicy<HttpResponseMessage> GetCircuitBreakerPolicy()
{
    return HttpPolicyExtensions
        .HandleTransientHttpError()
        .CircuitBreakerAsync(1, TimeSpan.FromSeconds(45), ...);
}

static IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
    return HttpPolicyExtensions
        .HandleTransientHttpError()
        .Or<BrokenCircuitException>()
        .RetryAsync(2, ...);
}
  1. Initial request (1. attempt) is issued against https://httpstat.us/500
  2. It returns 500 which will increase the consecutive transient failure from 0 to 1
  3. Circuit Breaker opens because it reaches the predefined threshold
  4. CB escalates the problem to retry
  5. Retry is handling status 500, so retry issues another attempt immediately
  6. 1st retry request (2. attempt) is issued against https://httpstat.us/500
  7. CB prevents this call because it is broken
  8. CB throws a BrokenCircuitException
  9. Retry is handling BrokenCircuitException and it did not exceed its retryCount so it issues another attempt immediately
  10. 2nd retry request (3. attempt) is issued against https://httpstat.us/500
  11. CB prevents this call because it is broken
  12. CB throws a BrokenCircuitException
  13. Even though Retry is handling BrokenCircuitException it will not trigger because it reached its retrycount (2)
  14. Retry will throw the original exception (BrokenCircuitException) so httpClient's GetAsync will throw that one.

I hope this exercise helped you to better understand how to create a resilience strategy, where you combine multiple policies by escalating the problem.

Peter Csala
  • 17,736
  • 16
  • 35
  • 75
  • @VergilC. Is my explanation clear enough or do you need further explanation? – Peter Csala Mar 29 '21 at 07:59
  • Sorry for the delayed response, had not had a chance to look at your reply till today. Yes, it is very clear what is happening now, thank you very much for taking the time to investigate and provide such a detailed answer. I am going to adjust the code based on your example to play around with it, understand, and move towards the next piece of the puzzle exploring if the client's handler can be adjusted to try on a different URL, but yes, your answer has cleared up the fog. – Vergil C. Mar 29 '21 at 17:27
  • @VergilC. If you think my post is helpful then please consider to support my work by upvoting. If you have further question please feel free to ask. – Peter Csala Mar 30 '21 at 06:13
  • on your last example, at `5. Retry is handling status 500, so retry issues another attempt immediately` - Is there a way to configure that before the retry the request uri is changed? I now try to achieve my initial goal - "My client can use a primary or failover service. When the primary is down, use failover till the primary is back up. When both are down, we get alerted and can change the service addresses dynamically via secrets manager." I have tried to register the client with 2 policies using `builder.AdditionalHandlers.Add()` but that would not work. – Vergil C. Apr 06 '21 at 08:45
  • @VergilC. [Here](https://stackoverflow.com/questions/66763717/polly-to-change-query-string-on-retry) I have detailed how you can change the url for each retry attempt. Is it a suitable for you? – Peter Csala Apr 06 '21 at 08:58
  • Thanks, but that's not what I had in mind. I updated the project to demonstrate what I am after. In the controller, I added a second endpoint, which can be accessed on `/weatherforecast/2`. That uses the implementation of `WeatherService2`which is injected with two clients, primary and failover. A policy wrapper is used which will fall back to the failover client when the circuit is broken (or any exception in this case). Contd.. – Vergil C. Apr 06 '21 at 19:08
  • The other endpoint `/weatherforecast/1` is using `WeatherService` implementation which uses a client with no policies injected. Ideally, that single client would have the same policies wrapped and attached to the client in the Startup so that the behaviour of `WeatherService` is the same as `WeatherService2` but with policies abstracted and it only is injected with an IHttpClientFactory. – Vergil C. Apr 06 '21 at 22:07
  • @VergilC. With typed client you can separate the fallback logic from the service tier. Are you familiar with that concept or should I post a sample? – Peter Csala Apr 07 '21 at 08:11
  • I am aware of the typed client concept, but am not sure how that would help. A sample would help, thanks! – Vergil C. Apr 07 '21 at 14:29
  • @VergilC. Please check my other post where I've detailed the problems with your updated version and how you can fix them. – Peter Csala Apr 08 '21 at 08:57
0

I've reviewed your alternative solution which has the same design issue as it was discussed in my previous post.

public WeatherService2(IHttpClientFactory clientFactory, IEnumerable<IAsyncPolicy<HttpResponseMessage>> policies)
{
    _primaryClient = clientFactory.CreateClient("PrimaryClient");
    _failoverClient = clientFactory.CreateClient("FailoverClient");
    _circuitBreaker = policies.First(p => p.PolicyKey == "CircuitBreaker");

    _policy = Policy<HttpResponseMessage>
        .Handle<Exception>()
        .FallbackAsync(_ => CallFallbackForecastApi())
        .WrapAsync(_circuitBreaker);
}

public async Task<string> Get()
{
    var response = await _policy.ExecuteAsync(async () => await CallForecastApi());

    if (response.IsSuccessStatusCode) 
        return response.StatusCode.ToString();

    response = await CallFallbackForecastApi();
    return response.StatusCode.ToString();
}

Your Fallback policy is never triggered.

  1. HttpClient receives a response with statusCode 500
  2. CircuitBreaker breaks
  3. CB propagates the HttpResponseMessage with statusCode 500 to the outer policy
  4. Fallback does not trigger because it was setup for exceptions Handle<Exception>()
  5. Policy returns the HttpResponseMessage with statusCode 500
  6. Your code manually examines the response and then manually calls the fallback.

If you change your policy to this:

_policy = Policy
    .HandleResult<HttpResponseMessage>(response => response != null && !response.IsSuccessStatusCode)
    .Or<Exception>()
    .FallbackAsync(_ => CallFallbackForecastApi())
    .WrapAsync(_circuitBreaker);

then there is no need for manual fallback.

  1. HttpClient receives a response with statusCode 500
  2. CircuitBreaker breaks
  3. CB propagates the HttpResponseMessage with statusCode 500 to the outer policy
  4. Fallback triggers because it was setup for unsuccessful status codes as well
  5. HttpClient receives a response with statusCode 200
  6. Policy returns the HttpResponseMessage with statusCode 500

There is one more important thing that you need to understand. The previous code only works because you have registered the HttpClients without the circuitbreaker policy.

That means the CB is not attached to the HttpClient. So, if you change the code like this:

public async Task<HttpResponseMessage> CallForecastApi()
    => await _primaryClient.GetAsync("https://httpstat.us/500/");

public async Task<HttpResponseMessage> CallFallbackForecastApi()
    => await _primaryClient.GetAsync("https://httpstat.us/200/");

then even though the CircuitBreaker will be Open after the first attempt the CallFallbackForecastApi will not throw a BrokenCircuitException.

BUT if you attach the CB to the HttpClient like this:

services
    .AddHttpClient("PrimaryClient", client => client.BaseAddress = PlaceholderUri)
    ...
    .AddPolicyHandler(GetCircuitBreakerPolicy());

and then you simplify the WeatherService2 like this:

private readonly HttpClient _primaryClient;
private readonly IAsyncPolicy<HttpResponseMessage> _policy;

public WeatherService2(IHttpClientFactory clientFactory)
{
    _primaryClient = clientFactory.CreateClient("PrimaryClient");
    _policy = Policy
        .HandleResult<HttpResponseMessage>(response => response != null && !response.IsSuccessStatusCode)
        .Or<Exception>()
        .FallbackAsync(_ => CallFallbackForecastApi());
}

then it will miserably fail with a BrokenCircuitException.


If your WeatherService2 would look like this:

public class WeatherService2 : IWeatherService2
{
    private readonly HttpClient _primaryClient;
    private readonly HttpClient _secondaryClient;
    private readonly IAsyncPolicy<HttpResponseMessage> _policy;
    public WeatherService2(IHttpClientFactory clientFactory)
    {
        _primaryClient = clientFactory.CreateClient("PrimaryClient");
        _secondaryClient = clientFactory.CreateClient("FailoverClient");

        _policy = Policy
            .HandleResult<HttpResponseMessage>(response => response != null && !response.IsSuccessStatusCode)
            .Or<Exception>()
            .FallbackAsync(_ => CallFallbackForecastApi());
    }

    public async Task<string> Get()
    {
        var response = await _policy.ExecuteAsync(async () => await CallForecastApi());
        return response.StatusCode.ToString();
    }

    public async Task<HttpResponseMessage> CallForecastApi()
        => await _primaryClient.GetAsync("https://httpstat.us/500/");

    public async Task<HttpResponseMessage> CallFallbackForecastApi()
        => await _secondaryClient.GetAsync("https://httpstat.us/200/");
}

then it could work fine only if the PrimaryClient and FailoverClient have different circuit breakers.

services
    .AddHttpClient("PrimaryClient", client => client.BaseAddress = PlaceholderUri)
    ...
    .AddPolicyHandler(GetCircuitBreakerPolicy());

services
    .AddHttpClient("FailoverClient", client => client.BaseAddress = PlaceholderUri)
    ...
    .AddPolicyHandler(GetCircuitBreakerPolicy());

if they would share the same Circuit Breaker then the second call would fail again with a BrokenCircuitException.

var cbPolicy = GetCircuitBreakerPolicy();

services
    .AddHttpClient("PrimaryClient", client => client.BaseAddress = PlaceholderUri)
    ...
    .AddPolicyHandler(cbPolicy);

services
    .AddHttpClient("FailoverClient", client => client.BaseAddress = PlaceholderUri)
    ...
    .AddPolicyHandler(cbPolicy);
Peter Csala
  • 17,736
  • 16
  • 35
  • 75