My problem is same as this one: Scalable Contains method for LINQ against a SQL backend
Synopsis: A User posts a list of longs (ids) to my asp.net async controller method. The controller needs to pull two columns from the SQL database for each id and returns it as json array. Since I'm using EF/Linq, as mentioned in the link above, I use the Contains method as such:
long[] ids; //Assume that the posted list of ids to the controller method
var courses = await db.Courses.AsNoTracking().Where(x => ids.Contains(x.id))
.Select(x => new JRecord { id = x.id, name = x.name, status = x.status})
.ToListAsync();
return Request.CreateResponse(HttpStatus.OK, courses);
EF converts Contains into SQL IN statement. The problem is majority of the time, list of ids is few 100 which is fine, but a user could also select few thousand entries which results in a really slow query or query failing altogether.
The author (in link above) posted following solution to another problem where Linq Extension just splits the IDs array into chunks, runs each one as a separate & smaller query, then merges the results from all the queries back into a single list. The reason is not to improve performance but to ensure the Query doesn't fail when a lot of ids are provided.
His code: https://stackoverflow.com/a/6852288/934257
public static IEnumerable<IEnumerable<T>> ToChunks<T>(this IEnumerable<T> enumerable, int chunkSize)
{
int itemsReturned = 0;
var list = enumerable.ToList(); // Prevent multiple execution of IEnumerable.
int count = list.Count;
while (itemsReturned < count)
{
int currentChunkSize = Math.Min(chunkSize, count - itemsReturned);
yield return list.GetRange(itemsReturned, currentChunkSize);
itemsReturned += currentChunkSize;
}
}
Usage of his Linq Extension:
var courses = ids.ToChunks(1000)
.Select(chunk => Courses.Where(c => chunk.Contains(c.CourseID)))
.SelectMany(x => x).ToList();
I was hoping to adopt this extension to my scenario, so I could use a simple construct such as ToChunks(1000) and it will split ID array into 1000 length long sections, run async EF query on each section of IDs and merge the result back together into a single list. It would be much cleaner and reusable solution than manually splitting ID array, creating a for loop and running queries individually over ID array sections and merging the results back into a list.