Query Azure Table Storage efficiently?

Question

I am querying ATS using the following logic:

var query = from m in context.CreateQuery<MyTable>(tableName)
            where m.PartitionKey.CompareTo(partitionKey) == 0  
            select m;

var results = new List<MyTable>();

CloudTableQuery<MyTable> messageTableQuery = (CloudTableQuery<MyTable>)query.AsTableServiceQuery();
ResultContinuation rc = null;

do
{
    var asyncResult = rc == null ? messageTableQuery.BeginExecuteSegmented(null, null) :
        messageTableQuery.BeginExecuteSegmented(rc, null, null);
    ResultSegment<MyTable> result = messageTableQuery.EndExecuteSegmented(asyncResult);
    results.AddRange(result.Results);
    rc = result.ContinuationToken;
} while (rc != null);

return results;

This seems to return in a reasonable amount of time when I point to my dev ATS, but it takes a painfully long amount of time when I point to prod ATS which contains millions of partitions. My question is, is there any way I can query the ATS in a more efficient way?

My query doesn't necessarily need to get all the row keys with in a partition key. I just need a subset of them.

It is not clear from your code above if the query you're executing included `PartitionKey`. For best performance, your query must include `PartitionKey`. — Gaurav Mantri, Aug 23 '14 at 13:10
you can add filters if you know any specific criteria for selecting the rows e.g filter by timestamp. Not that it is going to make a huge difference, Partitionkey.compareto can be changed to == . compareto is used mostly when there is sorting required. — Aravind, Aug 25 '14 at 14:43
I know the list of row keys that I want to fetch from ATS but they dont fit in a criteria. They are just a list of Guids. Is it faster to query ATS for a full partition and do the filtering on the client or it is faster to iteratively query ATS for a PartionKey==x && RowKey==y? — Minas, Aug 25 '14 at 19:15

score 0 · Answer 1 · edited May 23 '17 at 10:33

The number of partitions in your table should not be a problem performance-wise. However, the number of rows in the partition is a problem.

Problems:

1) If there are many rows in the partition where PartitionKey = 0, then with your current query you are going to retrieve every row in segments of 1000 rows, then filter it locally. So if you have 100,000 rows in your partition, the code is going to make 100 queries. Each response could be large and contain many rows that you don't need.

2) Memory pressure. The other problem you could be hitting is thrashing your local memory by downloading all rows and putting them in a single list.

Solutions:

1) If you know all the RowKey GUIDs you need to retrieve, you could create a more efficient query (depending on how many rows you need):

from m in context.CreateQuery<MyTable>(tableName) where m.PartitionKey.CompareTo(partitionKey) == 0 && (m.RowKey.CompareTo(guid1) == 0 || m.RowKey.CompareTo(guid2) == 0 ... ) select m;

You could end up hitting the maximum length allowed for the query if you have a lot of rows to retrieve. Which means you'd want to chunk your list of GUIDs, and send a separate request for each chunk.

Also, you may find it easier to build up your query using table filters (TableQuery.GenerateFilterCondition and TableQuery.CombineFilters) as discussed here: Querying Windows Azure Table Storage with multiple query criteria

2) If solution 1 doesn't work for your problem, and you need to filter locally, then filter as you retrieve each segment.

do { ... var filteredResults = FilterResultsByGuid(result.Results); // Imp results.AddRange(filteredResults); ... } while (rc != null);

Where FilterResultsByGuid is your custom method to locally filter out the unneeded rows. This will allow the unneeded rows to be garbage collected and reduce memory pressure (though not eliminate it).

Query Azure Table Storage efficiently?

1 Answers1