0

Assume I have a huge table storage where customers are stored in. Let's say Partition Key is their Zip-Code, RowKey is their signup-timestamp.

Now, what is the smartest and most efficient way to find the ONE FIRST customer for a given (zip code) area, who signed up after a given date (the early bird :-) )? Assume that the entries where not ordered when written to the table storage.

My initial idea was to have a helper method like this (which I need anyway for other purposes):

public IEnumerable<Customer> GetCustomers(string zip, long stampStart, long stampEnd)
{
    if (_table == null) return new List<Customer>();

    return query = (from entry in _table.CreateQuery<Customer>()
        where entry.PartitionKey == zip
        && entry.RowKey.CompareTo(stampStart) <= 0 
        && entry.RowKey.CompareTo(stampEnd) >= 0
                 select entry);
}

and then use it to fire a request like this:

public Customer GetEarlyBird(string zip, long stamp)
{
    if (_table == null) return null;

    return
        GetCustomers(zip, stamp, stamp + 31536000) //covers a one year period
            .OrderBy(x => x.SignupStamp)
            .FirstOrDefault();
}

And finally call

var zip = //some zip code;
var lookupStamp = //some long timestamp;
var earlyBird = GetEarlyBird(zip, lookupStamp);

However, due to the OrderBy-call, the entire query result must be evaluated which takes forever. On the other hand, without ordering the result of the query, FirstOrDefault does not necessarily return the Customer who signed up closest after stamp, but instead the first in the list (which could be any customer from that area, as they were not necessarily ordered when stored in the table).

What am I missing? What is the smartest way to "outsource" the ordering to the Database instead of doing it in memory? Or has my approach some other major flaws I'm missing?

Christian
  • 1,589
  • 1
  • 18
  • 36
  • Note that because you return IEnumerable from GetCustomers - your OrderBy and FirstOrDefault happen in memory, not in database. I just answered similar question here: http://stackoverflow.com/a/42576299/5311735. This of course slows down whole process to the ground. – Evk Mar 03 '17 at 11:48
  • So what is your suggestion? Wouldn't returning a list slow down the process even more, as it will require to load all the data into memory first, turn it into a list and then process it? – Christian Mar 03 '17 at 11:55
  • 1
    My suggestion - return IQueryable instead of IEnumerable to avoid premature query. Because now it loads all data in memory first already. – Evk Mar 03 '17 at 11:56
  • So in short - just replace `IEnumerable GetCustomers` with `IQueryable GetCustomers` without changing anything else and see if that helps. – Evk Mar 03 '17 at 12:07
  • Thanks, m8. However, that gives me a "TargetInvocationException" when calling LastOrDefault() on the IQueryable returned from GetCustomers – Christian Mar 03 '17 at 12:29
  • Well that exception should contain more details in inner exception. Though I must admit I don't have much experience with azure storage, that's just general guidelines for all LINQ providers that work with IQueryable. – Evk Mar 03 '17 at 12:30

2 Answers2

2

If you convert the signup-timestamp to DateTime.Ticks and then subtract that from DateTime.Max.Ticks and use that as the row key, Azure Table Storage service will naturally sort the latest entry at the top because it will have the smallest row key. So if you query with specific partition key with Take(1) you will retrieve the latest entry for that partition key. This way there is no partition scan, there is no filtering neither in client nor service.

Dogu Arslan
  • 3,292
  • 24
  • 43
1

As Dogu Arslan says you could use the DateTime.Ticks to become your rowkey and use take method to get the first value.

More details, you could refer to below codes:

    DateTime d1 = new DateTime(2016, 11, 1);
        DateTime d2 = new DateTime(2016, 12, 1);
        var query = (from ent in query2
                     where
                         ent.PartitionKey == "ZIP"
                       && ent.RowKey.CompareTo(string.Format("{0:D19}", d1.Ticks)) > 0
                      && ent.RowKey.CompareTo(string.Format("{0:D19}", d2.Ticks)) < 0
                     select ent).Take(1).FirstOrDefault() ;

I suggest you could also pay attention to below things:

1.If you want to get the who firstly signed up after a given date. I suggest you could directly use the DateTime.Now.Ticks, since the azure table will automatic order the entities by partition key and row key ascending. The early time ticks will be smaller than the time now ticks.

2.You must pad the reverse tick value with leading zeroes to ensure the string value sorts as expected. More details, you could refer to below image:

enter image description here

Brando Zhang
  • 22,586
  • 6
  • 37
  • 65