How to use Table.ExecuteQuerySegmentedAsync() with Azure Table Storage

Question

Working with the Azure Storage Client library 2.1, I'm working on making a query of Table storage async. I created this code:

public async Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
    var theQuery = _table.CreateQuery<TAzureTableEntity>()
                         .Where(tEnt => tEnt.PartitionKey == partitionKey);
    TableQuerySegment<TAzureTableEntity> querySegment = null;
    var returnList = new List<TAzureTableEntity>();
    while(querySegment == null || querySegment.ContinuationToken != null)
    {
        querySegment = await theQuery.AsTableQuery()
                                     .ExecuteSegmentedAsync(querySegment != null ?
                                         querySegment.ContinuationToken : null);
        returnList.AddRange(querySegment);
    }
    return returnList;
}

Let's assume there is a large set of data coming back so there will be a lot of round trips to Table Storage. The problem I have is that we're awaiting a set of data, adding it to an in-memory list, awaiting more data, adding it to the same list, awaiting yet more data, adding it to the list... and so on and so forth. Why not just wrap a Task.Factory.StartNew() around a regular TableQuery? Like so:

public async Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
    var returnList = await Task.Factory.StartNew(() =>
                                                 table.CreateQuery<TAzureTableEntity>()
                                                .Where(ent => ent.PartitionKey == partitionKey)
                                                .ToList());
    return returnList;
}

Doing it this way seems like we're not bouncing the SynchronizationContext back and forth so much. Or does it really matter?

Edit to Rephrase Question

What's the difference between the two scenarios mentioned above?

You put your "segment" in it's own async method and call it using ConfigureAwait(false). — Paulo Morgado, Oct 28 '13 at 00:53
@PauloMorgado -- Ironically I actually have the TableQuerySegement and while statement in a separate method already but I didn't know about the ConfigureAwait(false) method on the Task. Thanks for the tip! — Hallmanac, Oct 28 '13 at 09:19
ConfigureAwait(false) is recommended for all library code and I would extend it to all non-UI application code. — Paulo Morgado, Oct 28 '13 at 13:13

score 8 · Accepted Answer · answered Oct 28 '13 at 12:28

The difference between the two is that your second version will block a ThreadPool thread for the whole time the query is executing. This might be acceptable in a GUI application (where all you want is to execute the code somewhere other than the UI thread), but it will negate any scalability advantages of async in a server application.

Also, if you don't want your first version to return to the UI context for each roundtrip (which is a reasonable requirement), then use ConfigureAwait(false) whenever you use await:

querySegment = await theQuery.AsTableQuery()
                             .ExecuteSegmentedAsync(…)
                             .ConfigureAwait(false);

This way, all iterations after the first one will (most likely) execute on a ThreadPool thread and not on the UI context.

BTW, in your second version, you don't actually need await at all, you could just directly return the Task:

public Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
    return Task.Run(() => table.CreateQuery<TAzureTableEntity>()
                               .Where(ent => ent.PartitionKey == partitionKey)
                               .ToList());
}

How does the querySegment (first version) assist (instead of negate) scalability advantages of async? All things being equal you're still occupying a `ThreadPool` thread. Especially when using the `ConfigureAwait(false)`. Am I thinking correctly? — Hallmanac, Oct 29 '13 at 10:29
@Hallmanac No, that's wrong. The point of `await` is that while the asynchronous operation executes, you are *not* occupying any threads. With `ConfigureAwait(false)`, you're using a `ThreadPool` thread only for a short time between segments, to start the next segment. — svick, Oct 29 '13 at 13:53
Gotcha. Didn't realize that it wasn't occupying a `ThreadPool` thread while waiting for the `QuerySegment` to return with data. Ironically, while digging deeper on your previous comment I found another good answer you gave on this subject. :-) Here's the link in case others find it as useful as I did. http://stackoverflow.com/a/14898584/350312 — Hallmanac, Oct 29 '13 at 18:04

Gaurav Mantri · Answer 2 · 2013-10-28T13:30:08.820

Not sure if this is the answer you're looking for but I still want to mention it :).

As you may already know, the 2nd method (using Task) handles continuation tokens internally and comes out of the method when all entities have been fetched whereas the 1st method fetches a set of entities (up to a maximum of 1000) and then comes out giving you the result set as well as a continuation token.

If you're interested in fetching all entities from a table, both methods can be used however the 1st one gives you the flexibility of breaking out of loop gracefully anytime, which you don't get in the 2nd one. So using the 1st function you could essentially introduce pagination concept.

Let's assume you're building a web application which shows data from a table. Further let's assume that the table contains large number of entities (let's say 100000 entities). Using 1st method, you can just fetch 1000 entities return the result back to the user and if the user wants, you can fetch next set of 1000 entities and show them to the user. You could continue doing that till the time user wants and there's data in the table. With the 2nd method the user would have to wait till all 100000 entities are fetched from the table.

This was the other reason I was thinking about this. If I were to trick/force the method to return a true IEnumerable then I negate the benefits of async/await. Though your pagination use case made me think a bit. You would have to pass up the continuation token to the consumer so that they can pass it back to the repository in order to pick up where they left off. This would be a good use case for async in this context. — Hallmanac, Oct 28 '13 at 22:06

How to use Table.ExecuteQuerySegmentedAsync() with Azure Table Storage

2 Answers2