Assume I have a huge table storage where customers are stored in. Let's say Partition Key is their Zip-Code, RowKey is their signup-timestamp.
Now, what is the smartest and most efficient way to find the ONE FIRST customer for a given (zip code) area, who signed up after a given date (the early bird :-) )? Assume that the entries where not ordered when written to the table storage.
My initial idea was to have a helper method like this (which I need anyway for other purposes):
public IEnumerable<Customer> GetCustomers(string zip, long stampStart, long stampEnd)
{
if (_table == null) return new List<Customer>();
return query = (from entry in _table.CreateQuery<Customer>()
where entry.PartitionKey == zip
&& entry.RowKey.CompareTo(stampStart) <= 0
&& entry.RowKey.CompareTo(stampEnd) >= 0
select entry);
}
and then use it to fire a request like this:
public Customer GetEarlyBird(string zip, long stamp)
{
if (_table == null) return null;
return
GetCustomers(zip, stamp, stamp + 31536000) //covers a one year period
.OrderBy(x => x.SignupStamp)
.FirstOrDefault();
}
And finally call
var zip = //some zip code;
var lookupStamp = //some long timestamp;
var earlyBird = GetEarlyBird(zip, lookupStamp);
However, due to the OrderBy-call, the entire query result must be evaluated which takes forever. On the other hand, without ordering the result of the query, FirstOrDefault does not necessarily return the Customer who signed up closest after stamp, but instead the first in the list (which could be any customer from that area, as they were not necessarily ordered when stored in the table).
What am I missing? What is the smartest way to "outsource" the ordering to the Database instead of doing it in memory? Or has my approach some other major flaws I'm missing?