1

We have a requirement to purge the Azure WADLogs table on a periodic basis. We are achieving this by using Entity group transactions to delete the records older than 15 days. The logic is like this.

bool recordDoesNotExistExceptionOccured = false;
CloudTable wadLogsTable = tableClient.GetTableReference(WADLogsTableName);

partitionKey = "0" + DateTime.UtcNow.AddDays(noOfDays).Ticks;
TableQuery<WadLogsEntity> buildQuery = new TableQuery<WadLogsEntity>().Where(
                            TableQuery.GenerateFilterCondition("PartitionKey",                
                            QueryComparisons.LessThanOrEqual, partitionKey));

while (!recordDoesNotExistExceptionOccured)
{
    IEnumerable<WadLogsEntity> result = wadLogsTable.ExecuteQuery(buildQuery).Take(1000);

    //// Batch entity delete.
    if (result != null && result.Count() > 0)
    {
         Dictionary<string, TableBatchOperation> batches = new Dictionary<string, TableBatchOperation>();
         foreach (var entity in result)
         {
             TableOperation tableOperation = TableOperation.Delete(entity);

             if (!batches.ContainsKey(entity.PartitionKey))
             {
                 batches.Add(entity.PartitionKey, new TableBatchOperation());
             }

             // A Batch Operation allows a maximum 100 entities in the batch which must share the same PartitionKey.                        
             if (batches[entity.PartitionKey].Count < 100)
             {
                 batches[entity.PartitionKey].Add(tableOperation);
             }
        }

        // Execute batches.
        foreach (var batch in batches.Values)
        {
            try
            {
                await wadLogsTable.ExecuteBatchAsync(batch);
            }
            catch (Exception exception)
            {
                // Log exception here.

                // Set flag.
                if (exception.Message.Contains(ResourceDoesNotExist))
                {
                    recordDoesNotExistExceptionOccured = true;
                }
                break;
            }
        }
    }
    else
    {                            
        break;
    }
}

My questions are:

  1. Is this an efficient way to purge the WADLogs table? If not, what can make this better?
  2. Is this the correct way to handle the "Specified resource does not exist exception"? If not, how can I make this better?
  3. Would this logic fail in any particular case?
  4. How would this approach change if this code is in a worker which has multiple instances deployed?

I have come up with this code by referencing the solution given here.

Community
  • 1
  • 1
nikhil
  • 155
  • 4
  • 12
  • It would help if you put your actual code in the question instead of screenshot. Copying/pasting becomes easier that way :P. – Gaurav Mantri Oct 22 '14 at 11:16
  • @Gaurav: I avoided the code since it removes the indentation :). It took time, but I have replaced the image with the code. Thanks. – nikhil Oct 22 '14 at 12:40

1 Answers1

1
  1. Batches are in general the best way to purge part of a table if you can't just delete the whole thing. However, there are a couple of problems with details in the code above that I'll outline below.

  2. I'm not sure why you're getting this error for a table entity which was just returned in the query unless you're deleting things from this table on a different thread. However, with the scenario you've given you should not halt the whole process when you get an error. Say you had 1000 results to delete. The 2nd had been deleted by another thread and returned an error when you tried to delete it again - now you're missing deleting all of the other entities, including the other ones in that batch since batches are atomic! To fix this scenario, I'd suggest removing the failing entity from the batch (the error message will include its index), retrying that batch, and then continuing to delete the rest of the batches.

  3. See 2. Additionally, if you have more than 100 entities with the same partition key you are not deleting the remainder right away but are instead ignoring them and deleting them in the next loop iteration. If there's many entries with the same partition key, this could result in a lot of extra queries so it might be more efficient to add them to a new batch.

  4. Once you get more workers going, the issue in 2 will get significantly worse with the current design since you always query the first 1000. Now you'll have multiple threads downloading the same results and trying to delete the same things. With the current design this will cause a lot of errors since workers will be deleting the same results. The easiest solution I see would be finding a better way to divide the work that doesn't cause overlap. One idea might be to have each worker take a different day to delete which would also have the added benefit of optimizing the number of batches which have to be sent. Of course, there are many other valid strategies.

Emily Gerner
  • 2,427
  • 16
  • 16
  • Thanks for answering Emily. I agree with your suggestions. Only thing is that in point #3, if there are more than a 100 entries with the same partition key, then since the code is in a while loop, the query will return the remainder of the entries and those will be deleted eventually in batches of 100. Also, in the exception, the index is always coming as 0. So I am guessing this happens in case of multiple instances. I think I will have to have a strategy to avoid overlap in this scenario. – nikhil Oct 30 '14 at 03:07
  • I just verified, and the exception occurs even in case of a single worker instance. I guess this exception occurs because when the roles are deployed, there are no records older than 15 days. Any way to avoid this exception? – nikhil Oct 30 '14 at 03:28
  • I think ignoring the exception is just fine. For #3 I should clarify that it would be more efficient to do it this way to avoid re-querying the same things, but like you say with the current design it isn't a problem. I'll edit my response to reflect that. – Emily Gerner Oct 30 '14 at 16:34
  • Regarding #2, I believe this can also happen in non-multithreaded scenarios, if the method is called (serially) in quick succession. At least that's the only way I can explain this happening to me in similar circumstances. My theory is that Azure Storage doesn't get the chance to "fully" delete the record before you (successfully) query for it again and subsequently ask for its deletion, at which point it's already completely deleted and the error above is produced. – Ohad Schneider Dec 18 '16 at 17:43