20

I've seen various questions around SO about how to get the total row count of an Azure storage table, but I want to know how to get the number of rows within a single partition.

How can I do this while loading a minimal amount of entity data into memory?

dreftymac
  • 31,404
  • 26
  • 119
  • 182
goelze
  • 488
  • 1
  • 3
  • 14

6 Answers6

24

As you may already know that there's no Count like functionality available in Azure Tables. In order to get the total number of entities (rows) in a Partition (or a Table), you have to fetch all entities.

You can reduce the response payload by using a technique called Query Projection. A query projection allows you to specify the list of entity attributes (columns) that you want table service to return. Since you're only interested in total count of entities, I would recommend that you only fetch PartitionKey back. You may find this blog post helpful for understanding about Query Projection: https://blogs.msdn.microsoft.com/windowsazurestorage/2011/09/15/windows-azure-tables-introducing-upsert-and-query-projection/.

Gaurav Mantri
  • 128,066
  • 12
  • 206
  • 241
21

https://azure.microsoft.com/en-gb/features/storage-explorer/ allows you to define a Query and you can use the Table Statistics toolbar item to get the total rows for the whole table or your query

enter image description here

Nigel Belham
  • 463
  • 3
  • 12
  • 3
    IMHO this function is useless. It counts the items of the query...what is also written at the bottom of the page. – Emaborsa Dec 21 '18 at 12:37
  • 4
    The number at the bottom of the page only counts to 1000. The current version will not let you request more. So, this is a useful way around that. – MBentley Apr 23 '19 at 17:37
  • Exactly what I was looking for.. nice one – CraftyFella Apr 23 '20 at 14:11
  • After clicking the button was waiting for something to happen. It turned out the result was being printed on the Activities window below. Thanks for this answer @Nigel. – jokab Feb 25 '22 at 07:43
4

Tested the speed using Stopwatch to fetch and count 100,000 entities in a Partition that have three fields in addition to the standard TableEntity.

I select just the PartitionKey and use a resolver to end up with just a list of strings, which once the entire Partition has been retrieved I count.

Fastest I have got it is around 6000ms - 6500ms. Here is the function:

public static async Task<int> GetCountOfEntitiesInPartition(string tableName, string partitionKey)
    {
        CloudTable table = tableClient.GetTableReference(tableName);

        TableQuery<DynamicTableEntity> tableQuery = new TableQuery<DynamicTableEntity>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey)).Select(new string[] { "PartitionKey" });

        EntityResolver<string> resolver = (pk, rk, ts, props, etag) => props.ContainsKey("PartitionKey") ? props["PartitionKey"].StringValue : null;

        List<string> entities = new List<string>();

        TableContinuationToken continuationToken = null;
        do
        {
            TableQuerySegment<string> tableQueryResult =
                await table.ExecuteQuerySegmentedAsync(tableQuery, resolver, continuationToken);

            continuationToken = tableQueryResult.ContinuationToken;

            entities.AddRange(tableQueryResult.Results);
        } while (continuationToken != null);

        return entities.Count;
    }

This is a generic function, all you need is the tableName and partitionKey.

NickBrooks
  • 81
  • 3
  • 2
    Since you are only after the count you do not need to add the entities fetched to a list, rather just immediately increment a counter with the results count for that segment. – James Westgate Jul 05 '19 at 18:00
3

You could achieve this by leveraging atomic batch operation of azure table storage service pretty efficiently. For every partition have an additional entity with the same partition key and a specific row key like "PartitionCount" etc. That entity will have a single int (or long ) property Count.

Every time you insert a new entity do an atomic batch operation to also increment the Count property of your partition counter entity. Your partition counter entity will have the same partition key with your data entity so that allows you to do an atomic batch operation with guaranteed consistency.

Every time you delete an entity, go and decrement the Count property of the partition counter entity. Again in a batch execute operation so these 2 operations are consistent.

If you want to just read the value of partition count then all you need to do is to make a single point query to the partition counter entity and its Count property will tell you the current count for that partition.

Dogu Arslan
  • 3,292
  • 24
  • 43
  • 1
    Azure Storage Table does not have atomic operations. Every "atomic" operation would require multiple request for read and merge. – Rambalac Aug 31 '17 at 02:26
  • well lets start building up knowledge first before we post up. see here: https://learn.microsoft.com/en-us/rest/api/storageservices/performing-entity-group-transactions and the comment "Operations within a change set are processed atomically; that is, all operations in the change set either succeed or fail. Operations are processed in the order they are specified in the change set." – Dogu Arslan Aug 31 '17 at 08:37
  • 1
    If you never used Azure Storage, please at least read documentation carefully. Azure storage does not have any atomic batch or increment operations over single item. That "either succeed or fail" means you have to repeat retrieve, increment, merge continuously till success, which in concurrent environment means to increment one item you have to send tens of request. – Rambalac Sep 01 '17 at 00:16
  • Surely you are not reading or getting the one line I pasted from the documentation ironically. Just search for the substring atomic in that sentence. From client side a batch operation is atomic simply explaining this to you either all operations succeed or all fail. And that is a general terminology used by the official documentation and industry to refer batch operations. I don't think you have ever used any batch operation because this would be simple to understand then. – Dogu Arslan Sep 01 '17 at 07:29
  • 1
    Ironically you did not read your own "prove". You did not read what is batch operation. Obviously you've never used Azure Storage. Batch operations are limited only to Storage operations and one entity can be only once in one batch. There is no Azure Storage operation incrementing or modifying existing values, only replacing whole. You cannot read value, increment it and update in one atomic batch. – Rambalac Sep 01 '17 at 08:24
  • Ok I am explaining you my answer in simpler terms so that you could grasp it at the end of the day I should have written the answer for all levels of experience and expertise like none in your case. When I say increment in the answer, it is basically reading the Partition Count entity first which has the same PK as the item you want to insert, then on the client side incrementing its Partition Count property and then making a batch request to both insert new entity with that partition key but also replace the Partition Count entity with the incremented count value and rely on Optimistic Conc. – Dogu Arslan Sep 01 '17 at 09:08
  • So if in the meantime another client inserted a new item with the same PK then it would have updated the partition Count entity as well and our ENTIRE batch request would fail with 409 because etags on the Partition Count entity would not match, then you would retry same logic, again read the latest partition count entity, increment and do another batch operation. Because batch operation is atomic in that if replacing the Partition Count entity fails it would also fail inserting the new entity as such we would be in consistent state and safely retry. Too complicated fr you ? – Dogu Arslan Sep 01 '17 at 09:13
  • So the client will 1) read the count record for PK X 2) Construct the new table entity with PK X 3) Increment the local count variable 4) Create a TableBatchOperation with insert new table entity and a merge on the count record. Is that, right? – Morten Holdflod Møller Nov 03 '17 at 11:41
  • yes. and delete entity operation is the same instead of increment obviously you decrement the counter. – Dogu Arslan Nov 03 '17 at 13:58
  • How will Table Storage handle the case where two clients A and B interleave the 4 steps like A1, B1, B2, B3, B4, A2, A3, A4? – Morten Holdflod Møller Nov 03 '17 at 19:49
  • B will succeed A4 will fail because the Etag of the Count entity would be updated by B4 and will not match the one in A4 (see Optimistic Concurrency in Azure Table Storage). I explained these in detail in the answer and comments. – Dogu Arslan Nov 03 '17 at 21:44
  • This is a great thread, a++ would read again. While a great strategy for managing a count, this scenario would fail in the case of using table storage's Upsert operations, InsertOrReplace and InsertOrMerge. There's no way to know before the command is run whether the operation will be an insert or a replace/merge, and so you can't know whether to run the increment or not. Sad! – George Jun 22 '18 at 13:00
  • Yes for InsertOrMerge/Replace operations, a trade off can be to always Insert first (with the count increment operation) and if that fails with entity already exists error then do an unconditional merge in a 2nd I/O call without any increment. – Dogu Arslan Jun 22 '18 at 22:45
2

This can be done a bit shorter than @NickBrooks answer.

public static async Task<int> GetCountOfEntitiesInPartition<T>(
string tableName,string partitionKey) 
where T : ITableEntity, new()
{
    var tableClient = tableServiceClient.GetTableClient(tableName);
    var results = _tableClient.QueryAsync<T>(t => t.PartitionKey == partitionKey,
        select: new[] { "PartitionKey" });
    return await results.CountAsync();    
}

The results.CountAsync() comes from System.Linq.Async, a NuGet package which is officially supported by dotnet.

Rich_Rich
  • 427
  • 3
  • 15
-1

I think you can directly use the .Count in C#. You can use either this technique:

var tableStorageData = await table.ExecuteQuerySegmentedAsync(azQuery, null);
int count = tableStorageData.Count();

or

TableQuery<UserDetails> tableQuery = new TableQuery<UserDetails>();
var tableStorageData = table.ExecuteQuery(tableQuery,null);          
count = tableStorageData .Count();

The count variable will have the number total number of rows depending on the query.

  • No, It Won't. The Query will have continuation token. Each Query only returns upto a 1000 records. – st0le Feb 01 '21 at 20:44