5

We are using Azure Table Storage and are getting occasional 408 Timeouts when performing an InsertOrMerge operation. In this case we would like to retry, but it appears that the retry policy is not being followed for these errors.

This is a class we use to handle the table interaction. The method GetFooEntityAsync tries to retrieve the entity from Table Storage. If it cannot, it creates a new FooEntity and adds it to the table (mapping to a FooTableEntity).

public class FooTableStorageBase
{
    private readonly string tableName;

    protected readonly CloudStorageAccount storageAccount;

    protected TableRequestOptions DefaultTableRequestOptions { get; }

    protected OperationContext DefaultOperationContext { get; }

    public CloudTable Table
    {
        get
        {
            return storageAccount.CreateCloudTableClient().GetTableReference(tableName);
        }
    }

    public FooTableStorage(string tableName)
    {
        if (String.IsNullOrWhiteSpace(tableName))
        {
            throw new ArgumentNullException(nameof(tableName));
        }

        this.tableName = tableName;

        storageAccount = CloudStorageAccount.Parse(ConnectionString);

        ServicePoint tableServicePoint = ServicePointManager.FindServicePoint(storageAccount.TableEndpoint);
        tableServicePoint.UseNagleAlgorithm = false;
        tableServicePoint.ConnectionLimit = 100; // Increasing connection limit from default of 2.

        DefaultTableRequestOptions = new TableRequestOptions()
        {
            PayloadFormat = TablePayloadFormat.JsonNoMetadata,
            MaximumExecutionTime = TimeSpan.FromSeconds(1),
            RetryPolicy = new OnTimeoutRetry(TimeSpan.FromMilliseconds(250), 3),
            LocationMode = LocationMode.PrimaryOnly 
        };

        DefaultOperationContext = new OperationContext();

        DefaultOperationContext.Retrying += (sender, args) =>
        {
            // This is never executed.
            Debug.WriteLine($"Retry policy activated in {this.GetType().Name} due to HTTP code {args.RequestInformation.HttpStatusCode} with exception {args.RequestInformation.Exception.ToString()}");
        };

        DefaultOperationContext.RequestCompleted += (sender, args) =>
        {
            if (args.Response == null)
            {
                // This is occasionally executed - we want to retry in this case.
                Debug.WriteLine($"Request failed in {this.GetType().Name} due to HTTP code {args.RequestInformation.HttpStatusCode} with exception {args.RequestInformation.Exception.ToString()}");
            }
            else
            {
                Debug.WriteLine($"{this.GetType().Name} operation complete: Status code {args.Response.StatusCode} at {args.Response.ResponseUri}");
            }
        };

        Table.CreateIfNotExists(DefaultTableRequestOptions, DefaultOperationContext);
    }

    public async Task<FooEntity> GetFooEntityAsync()
    {
        var retrieveOperation = TableOperation.Retrieve<FooTableEntity>(FooTableEntity.GenerateKey());

        var tableEntity = (await Table.ExecuteAsync(retrieveOperation, DefaultTableRequestOptions, DefaultOperationContext)).Result as FooTableEntity;

        if (tableEntity != null)
        {
            return tableEntity.ToFooEntity();
        }

        var fooEntity = CalculateFooEntity();

        var insertOperation = TableOperation.InsertOrMerge(new FooTableEntity(fooEntity));
        var executeResult = await Table.ExecuteAsync(insertOperation);

        if (executeResult.HttpStatusCode == 408)
        {
            // This is never executed.
            Debug.WriteLine("Got a 408");
        }

        return fooEntity;
    }

    public class OnTimeoutRetry : IRetryPolicy
    {
        int maxRetryAttempts = 3;

        TimeSpan defaultRetryInterval = TimeSpan.FromMilliseconds(250);

        public OnTimeoutRetry(TimeSpan deltaBackoff, int retryAttempts)
        {
            maxRetryAttempts = retryAttempts;
            defaultRetryInterval = deltaBackoff;
        }

        public IRetryPolicy CreateInstance()
        {
            return new OnTimeoutRetry(TimeSpan.FromMilliseconds(250), 3);
        }

        public bool ShouldRetry(int currentRetryCount, int statusCode, Exception lastException, out TimeSpan retryInterval, OperationContext operationContext)
        {
            retryInterval = defaultRetryInterval;
            if (currentRetryCount >= maxRetryAttempts)
            {
                return false;
            }

            // Non-retryable exceptions are all 400 ( >=400 and <500) class exceptions (Bad gateway, Not Found, etc.) as well as 501 and 505. 
            // This custom retry policy also retries on a 408 timeout.
            if ((statusCode >= 400 && statusCode <= 500 && statusCode != 408) || statusCode == 501 || statusCode == 505)
            {
                return false;
            }

            return true;
        }
    }
}

When calling GetFooEntityAsync(), occasionally the "Request failed" line will be executed. When inspecting the values args.RequestInformation.HttpStatusCode = 408. However:

  • Debug.WriteLine("Got a 408"); within the GetFooEntity method is never executed.

  • Debug.WriteLine($"Retry policy activated... within the DefaultOperationContext.Retrying delegate is never executed (I would expect this to be executed twice - is this not retrying?).

  • DefaultOperationContext.RequestResults contains a long list of results (mostly with status codes 404, some 204s).

According to this (rather old) blog post, exceptions with codes between 400 and 500, as well as 501 and 505 are non-retryable. However a timeout (408) is exactly the situation we would want a retry. Perhaps I need to write a custom retry policy for this case.

I don't fully understand where the 408 is coming from, as I can't find it in the code other than when the RequestCompleted delegate is invoked. I have been trying different settings for my retry policy without luck. What am I missing here? How can I get the operation to retry on a 408 from table storage?

EDIT: I have updated the code to show the custom retry policy that I implemented, to retry on 408 errors. However, it seems that my breakpoints on retry are still not being hit, so it appears the retry is not being triggered. What could be the reason my retry policy is not being activated?

08Dc91wk
  • 4,254
  • 8
  • 34
  • 67
  • What is the partition/row key of foo entity? – RabtFt Sep 23 '16 at 13:25
  • You're correct. You would need to write a custom retry policy and mark 408 error code as retryable. – Gaurav Mantri Sep 23 '16 at 13:31
  • It's a string representation of a point ("latitude,longitude") generated from data in FooEntity. It works in almost all cases. I don't think it is related to the timeout. – 08Dc91wk Sep 23 '16 at 13:34
  • I was just wondering if collisions were perhaps occurring because you were using insertOrMerge. – RabtFt Sep 23 '16 at 13:52
  • Thanks Guarav, I will give that a try. And thanks RabtFt, collisions may be occurring, but then shouldn't that be handled by InsertOrMerge? – 08Dc91wk Sep 23 '16 at 13:56
  • See if this blog post helps: http://gauravmantri.com/2012/12/30/storage-client-library-2-0-implementing-retry-policies/. – Gaurav Mantri Sep 23 '16 at 14:07
  • Hi Guarav, your blog post was most helpful! I implemented a retry policy based on your ContainerBeingDeleted retry policy. I have updated the code to show the custom retry policy that I implemented, to retry on 408 errors. However, it seems that my breakpoints on retry are still not being hit, so it appears the retry is not being triggered. What could be the reason my retry policy is not being activated? – 08Dc91wk Sep 26 '16 at 07:21
  • Ivan, current SDK is configured to retry on HTTP 408. You don't need to implement a custom policy to trigger the retry. Please see: https://github.com/Azure/azure-storage-net/blob/68c3ee55a3a6f62a0159cea58005d3fe027312a8/Lib/Common/RetryPolicies/ExponentialRetry.cs – sguler Feb 03 '17 at 18:19
  • Also the reason you don't get the "got a 408" printed is because an exception thrown when 408 is received. Are you catching the exception somewhere ? – sguler Feb 03 '17 at 18:21
  • You're setting MaximumExecutionTime to 1 second which may not be sufficient for retries – yonisha Jun 02 '17 at 21:23

0 Answers0