-2

My problem is same as this one: Scalable Contains method for LINQ against a SQL backend

Synopsis: A User posts a list of longs (ids) to my asp.net async controller method. The controller needs to pull two columns from the SQL database for each id and returns it as json array. Since I'm using EF/Linq, as mentioned in the link above, I use the Contains method as such:

long[] ids;   //Assume that the posted list of ids to the controller method

var courses = await db.Courses.AsNoTracking().Where(x => ids.Contains(x.id))
.Select(x => new JRecord { id = x.id, name = x.name, status = x.status})
.ToListAsync();

return Request.CreateResponse(HttpStatus.OK, courses);

EF converts Contains into SQL IN statement. The problem is majority of the time, list of ids is few 100 which is fine, but a user could also select few thousand entries which results in a really slow query or query failing altogether.

The author (in link above) posted following solution to another problem where Linq Extension just splits the IDs array into chunks, runs each one as a separate & smaller query, then merges the results from all the queries back into a single list. The reason is not to improve performance but to ensure the Query doesn't fail when a lot of ids are provided.

His code: https://stackoverflow.com/a/6852288/934257

public static IEnumerable<IEnumerable<T>> ToChunks<T>(this IEnumerable<T> enumerable, int chunkSize)
{
     int itemsReturned = 0;
     var list = enumerable.ToList(); // Prevent multiple execution of IEnumerable.
     int count = list.Count;
     while (itemsReturned < count)
     {
          int currentChunkSize = Math.Min(chunkSize, count - itemsReturned);
          yield return list.GetRange(itemsReturned, currentChunkSize);
          itemsReturned += currentChunkSize;
     }
}

Usage of his Linq Extension:

var courses = ids.ToChunks(1000)
                 .Select(chunk => Courses.Where(c => chunk.Contains(c.CourseID)))
                 .SelectMany(x => x).ToList();

I was hoping to adopt this extension to my scenario, so I could use a simple construct such as ToChunks(1000) and it will split ID array into 1000 length long sections, run async EF query on each section of IDs and merge the result back together into a single list. It would be much cleaner and reusable solution than manually splitting ID array, creating a for loop and running queries individually over ID array sections and merging the results back into a list.

tunafish24
  • 2,288
  • 6
  • 28
  • 47
  • 3
    It doesn't make sense to talk about async operations here unless the source is itself async; are you proposing to switch to `IAsyncEnumerable`? If so, https://www.nuget.org/packages/System.Linq.Async/ might provide everything you need, but: this is a big change. Again, to emphasize: you can't just wave a wand and have a non-async method become async - the thing you're doing needs to inherently support async – Marc Gravell Feb 22 '21 at 18:36
  • @MarcGravell In example above ids is a List and Courses is a EF model representing a Table in SQL Server. The method will be called in asp.net webapi async controller, so need to make the data retrieval async as well. – tunafish24 Feb 22 '21 at 18:39
  • The main issue is that my endpoint will get a list of maybe 4000 ids, that I need to lookup in the database. Traditional Linq Contains() is not performant enough, and this non-async Extension method divides the ids into chunks and creates multiple sql queries. However, since the answer is quite old, its written using non-async semantics so not optimal solution for today. – tunafish24 Feb 22 '21 at 18:43
  • What is `list.GetRange` and is there an async version of it? – John Wu Feb 22 '21 at 18:49
  • No. This is an extension method using IEnumerable, I'm not good with extension methods, but my understanding is that Async versions use IQueryable interface. – tunafish24 Feb 22 '21 at 19:05
  • There's a reason that methods like `ToListAsync()` use the `IQueryable` interface. If you're passing a value into this method that has already been materialized, there will be no benefit at all to making it pretend that it's asynchronous. This synchronous solution is most likely the optimal solution. – StriplingWarrior Feb 22 '21 at 21:44
  • https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.getrange `GetRange` has been around for a while, but it just creates shallow copies of a list – pinkfloydx33 Feb 22 '21 at 22:48
  • IMHO, what you really want is a table value parameter. Or maybe to scan your id list for consecutive values. – Jeremy Lakeman Feb 22 '21 at 23:40
  • @MarcGravell Updated the question with specific details. – tunafish24 Feb 23 '21 at 08:07

2 Answers2

3

This is an XY problem. What you're trying to achieve isn't the same thing as what your question is asking. Your desired syntax wouldn't work, even if you made this method async:

var courses = await ids.ToChunks(1000)
                 .Select(chunk => Courses.Where(c => chunk.Contains(c.CourseID)))
                 .SelectMany(x => x).ToListAsync();

And the result you really want is just as easily accomplished without making this method async:

var courses = new List<Course>();
foreach(var chunk in ids.ToChunks(1000))
{
    courses.AddRange(await Courses.Where(c => chunk.Contains(c.CourseID)).ToListAsync());
}

Note: As Syatoslav pointed out, any attempt to get better performance via concurrency will be thwarted by the fact that DbContext won't let you perform multiple concurrent async calls. So you really won't get much advantage to doing this asynchronously anyway. And in some cases, you might find it to be much slower.

StriplingWarrior
  • 151,543
  • 27
  • 246
  • 315
  • 1
    You have introduced another problem. DbContext is not thread safe. – Svyatoslav Danyliv Feb 22 '21 at 21:51
  • Great point. I forgot that! – StriplingWarrior Feb 22 '21 at 21:54
  • Wrong again `Where` is not a materialisation and not async method. – Svyatoslav Danyliv Feb 22 '21 at 22:03
  • @SvyatoslavDanyliv: I was still working through things. How about now? – StriplingWarrior Feb 22 '21 at 22:07
  • Removed my previous comment. But this answer become very inefficient. For sure if OP want something like that. In your answer `courses` transformed to list... Why? – Svyatoslav Danyliv Feb 22 '21 at 22:13
  • `courses` was a list (via `ToListAsync()`) in the OP's desired usage syntax. I didn't change that. – StriplingWarrior Feb 22 '21 at 22:14
  • Well, for now you have N queries for N chunks. Even worse it is not correct result. – Svyatoslav Danyliv Feb 22 '21 at 22:17
  • @SvyatoslavDanyliv: I think that's the intent. If you look at the links provided in the OP, the problem was with generating a too-large query when including all of the IDs in a single query. So batching the ids would create a handful of round-trips, but still get all the requested courses. I'm basically giving the same behavior as what exists in the current, working code, but allowing the database round-trips to be done asynchronously. – StriplingWarrior Feb 23 '21 at 00:19
  • Thanks for the answer. I think I wasn't able to get my problem across well enough, so I've updated the question with details. The issue is not about performance but about being able to handle large array of IDs with EF/Linq Contains operator. – tunafish24 Feb 23 '21 at 08:13
  • I see how your answer works though, and could modify it to accomplish what I wanted to do. Since I have several methods that accept list of IDs, I was hoping to have a small Linq Extension that could be part of the query instead of manually creating the for loops. – tunafish24 Feb 23 '21 at 08:16
  • 1
    @tunafish24: You can probably still create extension methods to help: they're just different from the ones you ask for. For example, in the post you link, there's a "ChunkyContains" method: that's the one you could make Async and reuse in situations like this. Again, though, unless you're trying to take advantage of a cancellation token or something, it's probably a waste of time to bother making it Async. Just use the synchronous code you've already got. – StriplingWarrior Feb 23 '21 at 16:10
  • Yea, I guess this solution is good enough, no need to waste time, especially since the question got closed anyway. – tunafish24 Feb 23 '21 at 18:03
0

Simplest implementation that I can imagine:

public static async Task<IEnumerable<IEnumerable<T>>> ToChunksAsync<T>(this IQueryable<T> enumerable, int chunkSize)
{
     int itemsReturned = 0;
     var list = await enumerable.ToListAsync(); 
     int count = list.Count;
     while (itemsReturned < count)
     {
          int currentChunkSize = Math.Min(chunkSize, count - itemsReturned);
          yield return list.GetRange(itemsReturned, currentChunkSize);
          itemsReturned += currentChunkSize;
     }
}

For sure we can play with IAsyncEnumerable, but this is another challenge. Currently I do not see performance benefits of doing that.

Svyatoslav Danyliv
  • 21,911
  • 3
  • 16
  • 32
  • 1
    It should be noted that there's hardly any advantage to this `ToChunksAsync()` implementation versus simply calling `await ....ToListAsync()` and passing the result into the original `.ToChunks()` method. – StriplingWarrior Feb 22 '21 at 21:39
  • @StriplingWarrior, agree. Sometimes people want something without understanding what is async and what is done by compiler to make async magic works. – Svyatoslav Danyliv Feb 22 '21 at 21:45