4

Anyone have any idea on why the policy below stop retry after 3 times instead of 10 ?

IAsyncPolicy<HttpResponseMessage> httpWaitAndRetryPolicy =
     Policy.HandleResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
          .OrHandle<Exception>(r => true)
          .WaitAndRetryAsync(10, retryAttempt => TimeSpan.FromSeconds(2));

I set the retry attempt to 10 and test the http post call with BadRequest failure. But it only retry 3 times and then stopped until timeout and threw exception

----> System.Threading.Tasks.TaskCanceledException : A task was canceled.
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at HttpRetry.Lab.Tests.ServiceTest.WhenPostWrongAlert_ThenRecoversProperly() in C:\ServiceTest.cs:line 56
--TaskCanceledException


15:57:03.6367  INFO HttpClientProvider - Configuring client xxxxxxxx:1234/api/" timeout=00:02:00
15:57:03.6636  INFO Service            - POST xxxx/xxxxxxx
15:57:04.2051  INFO HttpClientProvider - Retrying retryCount=1 sleepDuration=00:00:02 result=Polly.DelegateResult`1[System.Net.Http.HttpResponseMessage]
15:57:06.6880  INFO HttpClientProvider - Retrying retryCount=2 sleepDuration=00:00:02 result=Polly.DelegateResult`1[System.Net.Http.HttpResponseMessage]
15:59:03.6811  INFO HttpClientProvider - Retrying retryCount=3 sleepDuration=00:00:02 result=Polly.DelegateResult`1[System.Net.Http.HttpResponseMessage]
15:59:03.6811 ERROR ServiceTest - System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at HttpRetry.Lab.Service.<PostAsync>d__4.MoveNext() in C:\Service.cs:line 38
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()
   at HttpRetry.Lab.Tests.ServiceTest.<PostAsync>d__4.MoveNext() in C:\ServiceTest.cs:line 27

var serviceProvider = serviceConnection.AddHttpClient(connection.Name, c =>
        {
            c.BaseAddress = new Uri(connection.BaseUrl);
            c.DefaultRequestHeaders.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Basic", Convert.ToBase64String(Encoding.ASCII.GetBytes($"{connection.UserName}:{connection.Password}")));
            c.Timeout = connection.Timeout; // Timeout is TimeSpan.FromSeconds(120)
        })
    .AddPolicyHandler(httpWaitAndRetryPolicy)
    .Services.BuildServiceProvider();

HttpClientFactories.Add(connection.Name, serviceProvider.GetService<IHttpClientFactory>());

Confirmed the root cause of the issue: I don't know what causes the symptom but looks like the request connection will not be released unless explicitly calling Dispose HttpResponseMessage OnRetry. The current solution is setup OnRetry in WaitAndRetryAsync and dispose the resposne. Everything's working fine without the needs of changing ServicePointManager.DefaultConnectionLimit

Peter Csala
  • 17,736
  • 16
  • 35
  • 75
cscmh99
  • 2,701
  • 2
  • 15
  • 18
  • How did you set the timeout (via Timeout policy / via cancellationToken / via HttpClient's "global" timeout / other way)? What does it mean `CancelRequest` status code in this case? Can you please share with us the `Execute` code as well? – Peter Csala Jul 09 '20 at 06:04
  • I have updated more information here. Will try to capture the exception once the remote endpoint is back on live. – cscmh99 Jul 09 '20 at 14:38
  • @PeterCsala I updated everything in my post. Please take a look. The issue is haunting me for quite awhile now – cscmh99 Jul 10 '20 at 08:12
  • Your exception message is out-of-sync with the code. Exception states 2 mintutes global timeout, where config comments says 1 min. Exception states 2 seconds sleepDuration, whereas policy setup says 5 seconds. – Peter Csala Jul 10 '20 at 09:55
  • @PeterCsala thanks for pointing that out. Things are inconsistent after multiple try-an-error. Will update the question shortly – cscmh99 Jul 10 '20 at 10:21

2 Answers2

5

Your timeout

As I can see you have a global 1 minute timeout on HttpClient level. That will throw a TaskCanceledException even though you might expect TimeoutException.

If you wish to receive TimeoutException then you have to specify a request based timeout via the RequestTimeout property of HttpRequestMessage. For further details please check the following link.

Your retry

Your retry logic defines 3 (or 10) retries with 5 seconds penalty. 3 retries means 4 attempts, because there is an initial (0th) request, which is out-side of the retry scope. If that fails then the 1st retry will become the 2nd attempt.

So the flow will look like this:

  1. Initial request sent out << 1st attempt
  2. Initial request failed
  3. Retry logic is triggered
  4. 5 seconds penalty is exposed
  5. 1st retry logic fires << 2nd attempt
  6. 2nd attempt failed
  7. Retry logic is triggered
  8. 5 seconds penalty is exposed
  9. 2nd retry logic fires << 3rd attempt
  10. ...

If all of this can't finish in under a minute then HttpClient will throw a TaskCanceledExpcetion due to global timeout.

Polly's timeout policy

Polly supports local and global timeout policies as well. There is a single Timeout policy, which can be used in both ways.

It can act as a local timeout if your timeout policy is wrapped inside the retry: retryPolicy.WrapAsync(timeoutPolicy);

It can act as a global timeout if your retry policy is wrapped inside the timeout: timeoutPolicy.WrapAsync(retryPolicy);

Of course, you can have global and local timeouts as well at the same time: Policy.WrapAsync(globalTimeoutPolicy, retryPolicy, localTimeoutPolicy);

I would highly encourage you to consider to use Polly's Timeout instead of HttpClient's Timeout in order to have a single place where you define your resilient strategy.

Please bear in mind that Timeout policy will throw TimeoutRejectedException if the timeout elapsed without a response. Because your retry handles all sort of Exceptions (.OrHandle<Exception>()), that's why you don't need to modify your retry policy.

Polly's transient failure's error handling

There is a nuget package called Microsoft.Extensions.Http.Polly (1), which defines several useful utilities. One of them is the HttpPolicyExtensions.HandleTransientHttpError()

It catches HttpRequestException and checks whether the response's status code is either 5xx or 408 (RequestTimeout).

It might worth considering to use this as well.

Debugging policies

Each of different policies define callback to provide ability to get insight how they work. In case of retry it is called onRetry or onRetryAsync for sync or async retry respectively. By providing the following delegate inside your WaitAndRetryAsync you can get really useful information:

onRetryAsync: (exception, delay, times, context) => {
  //TODO: logging
}
more urgent jest
  • 362
  • 4
  • 11
Peter Csala
  • 17,736
  • 16
  • 35
  • 75
  • Look like my colleague finally get to the bottom of this by examine the code in Polly. The problem is on Microsoft default value on ServicePointManager.DefaultConnectionLimit = 2. For me, I can only see the 2nd retry. So initial call + 1st retry used up the connection limit and the 2nd retry will just being starve until timeout. I will post more update on this topic later on. – cscmh99 Jul 10 '20 at 11:28
  • i'm wondering if this is because I set the HttpCompletionOption.ResponseHeadersRead at my HttpClient.Post. So the connection never get released as I never query the ResponseHeader. – cscmh99 Jul 10 '20 at 11:38
  • @cscmh99 Sorry to say, but `ServicePointManager.DefaultConnectionLimit` has nothing to do with Polly. It controls the outgoing concurrent Http request count. It varies based on the Host machine type (the value is different for Workstation and different for Windows Server). BTW: Which version of .NET Core are you using? And on what machine are you tesing? In case of Linux there is another property which controls this. – Peter Csala Jul 10 '20 at 11:40
  • I'm using .net framework 4.8. Cannot migrate to .net core as some external dependencies don't have net core version yet. Thanks a lot. Looks like the timeoutPolicy did release the http connection for the subsequent retry call on limited concurrent connection. I need to do more testing in order to make sure all retry goes through to rest api but at least the basic logic looks a lot better now – cscmh99 Jul 12 '20 at 04:27
  • @cscmh99 I'm glad to hear that. :D – Peter Csala Jul 13 '20 at 06:07
  • @cscmh99 Did you manage to test thoroughly your solution with timeoutPolicy? – Peter Csala Jul 14 '20 at 14:36
  • I don't know what causes the symptom but looks like the request connection will not be released unless explicitly calling Dispose HttpResponseMessage OnRetry. The problem is solved completely once I call HttpResponseMessage.Result.Dispose() on retry. Don't even need the timeout policy anymore. – cscmh99 Jul 14 '20 at 14:44
  • @cscmh99 In this case could be please provide an answer for your own question and close this topic? – Peter Csala Jul 15 '20 at 09:19
3

As a result, our team found out that the Polly Retry policy will not release http request connection until the HttpResponseMessage has been used. To be exact, this has nothing to do with Polly Retry, it's just the connection will not be released until the original HttpClient.SendAsync getting return. And the retry policy kinda delay that happen because of WaitAndRetry. End up the Polly Retry policy (e.g. x times) could end up using x+1 concurrent http connection per BadRequest.

There's 2 ways to "use" the HttpResponseMessage. Either by explicitly call response.Result.Dispose or performing some sort of Read on the response Content. e.g. response.Result.ReadAsAsync. Well, one more way is to wait for the httpClient timeout but I believe that's not really what we want. Here's the code that get things working. The key is the HttpResponse.Dispose in OnRetry

    ServicePointManager.DefaultConnectionLimit = appConfiguration.HttpClients.ConnectionLimit;

    IAsyncPolicy<HttpResponseMessage> httpWaitAndRetryPolicy =
        Policy.Handle<HttpRequestException>()
              .Or<Exception>()
              .OrResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
              .WaitAndRetryAsync(3, 
                                 retryAttempt => TimeSpan.FromSeconds(appConfiguration.HttpClients.RetryFactor * retryAttempt), 
                                 OnRetry);

    IAsyncPolicy<HttpResponseMessage> timeoutPolicy = Policy.TimeoutAsync<HttpResponseMessage>(appConfiguration.HttpClients.RequestTimeout);

    foreach (var endpoint in appConfiguration.HttpClients.HttpEndpoints)
    {
        var serviceProvider = serviceConnection.AddHttpClient(endpoint.Name,
                              c =>
                              {
                                  c.BaseAddress = new Uri(endpoint.BaseUrl);
                                  c.DefaultRequestHeaders.Authorization   = new System.Net.Http.Headers.AuthenticationHeaderValue(endpoint.AuthenticationScheme, Convert.ToBase64String(Encoding.ASCII.GetBytes($"{endpoint.UserName}:{endpoint.Password}")));
                                  c.DefaultRequestHeaders.ConnectionClose = false;
                                  c.Timeout     = endpoint.Timeout;
                              }).AddPolicyHandler(Policy.WrapAsync(httpWaitAndRetryPolicy, timeoutPolicy))
                              .Services.BuildServiceProvider();

        httpClientFactories.Add(endpoint.Name, serviceProvider.GetService<IHttpClientFactory>());
    }

    private Task OnRetry(DelegateResult<HttpResponseMessage> response, TimeSpan span, int retryCount, Context context)
    {
        if (response == null)
            return Task.CompletedTask;

        var responseResult = response.Result;
        logger.Info($"RetryCount={retryCount} IsSuccess={responseResult == null ? "" : responseResult.IsSuccessStatusCode} StatusCode={responseResult == null ? "" : responseResult.StatusCode} Exception={response.Exception?.Message}");

        response.Result?.Dispose();
        return Task.CompletedTask;
    }
Peter Csala
  • 17,736
  • 16
  • 35
  • 75
cscmh99
  • 2,701
  • 2
  • 15
  • 18