I am using Entity Framework and frequently run into issue where I want to iterate through large numbers of records. My issue is that if I pull them all at once, I risk a time out; if I pull one at a time, literally every single record will be a separate query and it takes forever.
I want to implement a Linq extension that pulls the results in batches but can still be used as an IEnumerable. I would give it a set of keys (most likely the primary IDs of whatever records I'm pulling), a batch size (higher for simple objects, lower for complex objects), and a Func
that defines how to apply a set of keys to a set of record type T
. I would call it like this:
//get the list of items to pull--in this case, a set of order numbers
List<int> orderNumbers = GetOrderNumbers();
//set the batch size
int batchSize = 100;
//loop through the set using BatchedSelector extension. Note the selection
//function at the end which allows me to
foreach (var order in dbContext.Orders.BatchedSelector(repairNumbers, batchSize, (o, k) => k.Contains(o.OrderNumber)))
{
//do things
}
Here's my draft solution:
/// <summary>
/// A Linq extension that fetches IEnumerable results in batches, aggregating queries
/// to improve EF performance. Operates transparently to application and acts like any
/// other IEnumerable.
/// </summary>
/// <typeparam name="T">Header record type</typeparam>
/// <param name="source">Full set of records</param>
/// <param name="keys">The set of keys that represent specific records to pull</param>
/// <param name="selector">Function that filters the result set to only those which match the key set</param>
/// /// <param name="maxBatchSize">Maximum number of records to pull in one query</param>
/// <returns></returns>
public static IEnumerable<T> BatchedSelector<T>(this IEnumerable<T> source, IEnumerable<int> keys, Func<T, IEnumerable<int>, bool> selector, int maxBatchSize)
{
//the index of the next key (or set of keys) to process--we start at 0 of course
int currentKeyIndex = 0;
//to provide some resiliance, we will allow the batch size to decrease if we encounter errors
int currentBatchSize = maxBatchSize;
int batchDecreaseAmount = Math.Max(1, maxBatchSize / 10); //10%, but at least 1
//other starting variables; a list to hold results and the associated batch of keys
List<T> resultList = null;
IEnumerable<int> keyBatch = null;
//while there are still keys remaining, grab the next set of keys
while ((keyBatch = keys.Skip(currentKeyIndex).Take(currentBatchSize)).Count() > 0)
{
//try to fetch the results
try
{
resultList = source.Where(o => selector(o, keyBatch)).ToList(); // <-- this is where errors occur
currentKeyIndex += maxBatchSize; //increment key index to mark these keys as processed
}
catch
{
//decrease the batch size for our retry
currentBatchSize -= batchDecreaseAmount;
//if we've run out of batch overhead, throw the error
if (currentBatchSize <= 0) throw;
//otherwise, restart the loop
continue;
}
//since we've successfully gotten the set of keys, yield the results
foreach (var match in resultList) yield return match;
}
//the loop is over; we're done
yield break;
}
For some reason, the "where" clause has no effect. I've validated that the correct keys are in keyBatch, but the expected WHERE OrderNumber IN (k1, k2, k3, kn)
line is not there. It is as if I didn't have the where statement at all.
My best guess is that I need to build the expression and compile it, but I'm not sure if that's the problem and I'm not really sure how to go about fixing it. Would love any input. Thanks!