7

What is a good tactic for querying Cosmos DB (Table Storage API) to get the "next" item in Storage? "Next" is defined as the top 1 item that was not returned by the last call. The item that was returned by the last call is being held in memory. Using .NET Framework, C#. Table is expected to hold around 2 million entries, so table scans are not preferred. :)

Table Storage looks like this: Partition Key (composite of multiple values): "0000/00/01/2020-01-11". Row key single int value 1 for example. Row contains other string data.

So Key-Value pairs (Partition Key and Row Key) look like the below. The application is read-heavy, but not necessarily using the Log Tail Pattern (https://learn.microsoft.com/en-us/azure/cosmos-db/table-storage-design-guide#log-tail-pattern).

0000/00/01/2020-01-11,1,aaa,x
0000/00/01/2020-01-11,2,aaa,y
0000/00/01/2020-01-11,3,aaa,z
0000/00/01/2020-01-11,4,bbb,x
0001/00/01/2020-01-11,5,aaa,x
0001/00/01/2020-01-11,6,ddd,x

(Note the bottom two entities will be in a different partition, so "aaa,x" exists in two partitions).

So I think querying to get just one item is

TableQuery<MyClass> query = new TableQuery<MyClass>()
                    .Where(TableQuery.GenerateFilterCondition("0000/00/01/2020-01-11", QueryComparisons.Equal, "aaa")).Take(1);

If that is code is correct, and returns "aaa,x" how to ensure that the subsequent query will get "aaa,y" and the next will get "aaa,z", and the next will get "bbb,x" and the next will get "aaa,x" again in the same partition?

If it does not make sense to make rich objects, and instead directly query the REST API and maybe keep the item that was last used in another Table and an Outer Join or other filter condition between the two tables, I'm open to designing in that direction instead.

Thanks!

Snowy
  • 5,942
  • 19
  • 65
  • 119
  • Discovered https://www.wintellect.com/wp-content/uploads/2017/05/AzureStorageTables-1.pdf, helping but I have not found a solution yet. – Snowy Jan 19 '20 at 03:37
  • It sounds like you want a queue, not a table – Hong Ooi Jan 20 '20 at 04:12
  • @HongOoi sorry I don't want a queue, as rows do get updated intermittently, and queue semantics would mean "aaa,x" would be discarded after it is used when I want that row within the partition after it is used. Thanks. – Snowy Jan 20 '20 at 18:29
  • So `1,2,3,4,5,6` are your example row keys, but you need to find all values with a given `aaa` value, which might be the rows `1,4,5`. This means that you basically know almost nothing about the row key. You only know "it must be higher than the last one I found". You will have a bit of scanning there, I don't see how you want to achieve O(1) like you describe in the bounty. Am I understanding this correctly? – Alex AIT Jan 21 '20 at 07:44
  • Second point: I don't quite understand your sample FilterCondition. It has the syntax `GenerateFilterCondition (string propertyName, string operation, string givenValue)` – Alex AIT Jan 21 '20 at 07:54
  • @AlexAIT I don't want only items with an aaa value, but rowkey 1 (aaa,x) then rowkey 2 (aaa,y) then rowkey 3 (aaa,z) then rowkey 4 (bbb,x) then rowkey 1 (aaa,x). Original item was not clear, editing. For the filter condition, I just don't know how it works. – Snowy Jan 22 '20 at 00:42

1 Answers1

5

Two possible ideas depending on your other requirements:

  • Manually iterate through the rows by keeping track of your last rowkey
  • Define a query to get all rows that match your criteria, and leverage the TableContinuationToken

You might need to adapt the filters in my examples to your exact requirements, but the idea should fit either way.


Manually iterate through the rows by keeping track of your last rowkey

string currentPartitionKey = "0000/00/01/2020-01-11";
string currentRowKey = "-1";

var query = new TableQuery()
    .Where(TableQuery.CombineFilters(
        TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, currentPartitionKey),
        TableOperators.And,
        // For the very first query, you may be able to omit this condition
        TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.GreaterThan, currentRowKey)))
    .Take(1);

// Execute, do stuff
// Switch to next partitionKey if no results were found


Define a query to get all rows that match your criteria, and leverage the TableContinuationToken

Code could look something like this:

string currentPartitionKey = "initialParitionkey";
do
{
    TableContinuationToken continuationToken = null;
    do
    {

        TableQuery query = new TableQuery()
            .Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, currentPartitionKey));

        var segment = await table.ExecuteQuerySegmentedAsync(query, continuationToken,
            new TableRequestOptions { TableQueryMaxItemCount = 1 }, cancellationToken);
        continuationToken = segment.ContinuationToken;

        // process results
    }
    while (continuationToken != null)

    currentPartitionKey = getNextPartitionKey(currentPartitionKey);
}
while (currentPartitionKey != null)
Alex AIT
  • 17,361
  • 3
  • 36
  • 73
  • I am struggling with the CancellationToken part. But experimenting... – Snowy Jan 23 '20 at 00:56
  • You can start with `CancellationToken.None`, tough for actual production use you should ideally have a different source for it. – Alex AIT Jan 23 '20 at 06:22
  • sifting through the overloads, most seem to want an OperationContext. I think your sample is leading in that direction (overload #12), could you show what code changes are needed to make this compile/run? – Snowy Jan 24 '20 at 13:51
  • Why don't you use the one with only `query, continuationToken` parameters? https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.cosmos.table.cloudtable.executequerysegmentedasync?view=azure-dotnet#Microsoft_Azure_Cosmos_Table_CloudTable_ExecuteQuerySegmentedAsync_Microsoft_Azure_Cosmos_Table_TableQuery_Microsoft_Azure_Cosmos_Table_TableContinuationToken_ – Alex AIT Jan 24 '20 at 14:31
  • Or the non-async version if you prefer: https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.cosmos.table.cloudtable.executequerysegmented?view=azure-dotnet – Alex AIT Jan 24 '20 at 14:32
  • In "// process results", I set a dummy DynamicTableEntity to segment.Results[0], and each time it picks the same first result, it does not go to the next result. – Snowy Jan 25 '20 at 03:46
  • It appears that segment has 3 entities in it, not just one as I expected. – Snowy Jan 25 '20 at 03:54
  • You have to use one of the overloads with `TableRequestOptions`, like in my answer. Sorry for the misleading comment above. You can pass `null` for `OperationContext`. If you properly store the continuationToken, you should not get the same result multiple times. – Alex AIT Jan 25 '20 at 08:36
  • TableContinuationToken seems to be a bust. I'll try the manual method instead. – Snowy Jan 26 '20 at 02:10