2

I have the following code:

List<int> projectIds=GetProjectsIds();// size is about 100,000 int value.
List<int> userIds=GetUsersIds();// size is about 100,000 int value.
List<int> nextIds=GetNextIds();// size is about 100,000 int value.


var IQueryList= db.Users.Where(obj=> projectIds.any(x=>x==obj.ProjectID) 
                                    || userIds.any(x=>x==obj.UserId) 
                                    || nextIds.any(x=>x==obj.NextId) );

it takes really long time when converting the IQueryable to List.

var YearsGroup = IQueryList.GroupBy(x => x.CreatedOn.Year)
                           .Select(g => g.FirstOrDefault())
                           .ToList()
                           .OrderByDescending(x => x.CreatedOn.Year);
// it's taking about 3 seconds to get executed.

How can I solve this issues, is there any other ways that I can apply to my code?

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
Hisham Aburass
  • 606
  • 1
  • 8
  • 15
  • The above code does not compile. _`any` as an example._ – mjwills Oct 18 '18 at 07:28
  • 2
    what's the code of `GetXXXIds()`? Probably you could run that in your database instead of fetching all these values into your application – fubo Oct 18 '18 at 07:28
  • 4
    I'm actually surprised it only takes 3 seconds, this code is actually translated to a `WHERE.. IN` with 300,000 integers! – Alexander Derck Oct 18 '18 at 07:29
  • `IQueryable` to `List` is bringing data to memory, no more remote processing, so it depends upon, how much data you are bringing, what kind of network, which will cause time delay – Mrinal Kamboj Oct 18 '18 at 07:29
  • 1
    If you change `projectIds` etc to `HashSet` rather than `List` and then use `Contains`, how long does the code take to run? How long does `var bob = db.Users.ToList();` take to run? – mjwills Oct 18 '18 at 07:29
  • Take a step back, tell us what you want `YearsGroup` to contain and why. – mjwills Oct 18 '18 at 07:32
  • @GSerg How did you come to the conclusion the OP was using L2S? To be honest, I am assuming the logic is occurring client-side rather than server-side (hence why I asked how long `ToList` is taking, since you can't pass 300,000 parameters in most databases). – mjwills Oct 18 '18 at 07:32
  • 1
    @more2know What database platform are you using (SQL Server? something else?)? Are you using Linq to SQL? Entity Framework? EF Core? – mjwills Oct 18 '18 at 07:34
  • 3
    Why do you build those three lists with 300000 ID's in memory and then query the database with each and every one on three different columns? Are you sure that you can't get them in the database and link to the user-table? – Tim Schmelter Oct 18 '18 at 07:34
  • 2
    You should really execute the `ToList` at the end of the query. For two reasons: you get a collection as result and not a query which would be executed again and again whenever you use it(for example with `foreach(var grp in OrderByDescending)...)`. You would also execute the `OrderByDescending` in the database and not in memory. – Tim Schmelter Oct 18 '18 at 07:39
  • @TimSchmelter OrderBy is actually often faster to do on the client side. Sorting needs the whole sequence, so server can not start to send data unless it has it all. It is a pipelining killer. – Antonín Lejsek Oct 18 '18 at 07:50
  • 1
    @AntonínLejsek: sorting is a core discipline of a database. If that's too slow you have to determine why. My point was also that `YearsGroup` is not a list even if he used `ToList`. The `OrderByDescending` at the end makes it a query. So he can't use any list methods and the query gets executed everytime he "touches" it. So if the in-memory-order is really useful here he should add another `ToList` at the end to "materialize" the query into a collection and he should replace the first ToList with AsEnumerable. – Tim Schmelter Oct 18 '18 at 07:53
  • @AlexanderDerck it's solved after converting the query to list before applying the GroupBy, do you have any comment for this ? – Hisham Aburass Oct 18 '18 at 08:46
  • @TimSchmelter thank you, I've just seen your comment, converting to list solved it. – Hisham Aburass Oct 18 '18 at 08:47
  • @more2know Well the `Where` clause where you check the ID's is what surprised me that is so fast. If you want to see the actual query generated, you can do `var sqlQuery = IQueryList.ToString()`. – Alexander Derck Oct 18 '18 at 08:48
  • How many rows in the Users table? – Magnus Oct 18 '18 at 09:07

1 Answers1

3

It's called Abstraction leak and it is the result of switching between Linq-To-Entities (i.e. Entity Framework) into - Linq-To-Objects.

Basically what the law states is that developers of reliable software must learn the abstraction's underlying details anyway.

Meaning:

your abstraction exposes some of the implementation details and there is nothing you can do with it unfortunetly.

From wikipedia:

A leaky abstraction refers to any implemented abstraction, intended to reduce (or hide) complexity, where the underlying details are not completely hidden

One way to solve it is to create an object with good constraint and maintain linq queries with As less as conditions possible.

In your code you are using Where and GroupBy which might cause for the penalty in the abstraction leak:

List<int> projectIds=GetProjectsIds();// size is about 100,000 int value.
List<int> userIds=GetUsersIds();// size is about 100,000 int value.
List<int> nextIds=GetNextIds();// size is about 100,000 int value.

//Linq to sql statement
var IQueryList= db.Users.Where(obj=> projectIds.Any(x=>x==obj.ProjectID) 
                                    || userIds.Any(x=>x==obj.UserId) 
                                    || nextIds.Any(x=>x==obj.NextId) );

//Linq to object, possible abstraction leak.
var YearsGroup = IQueryList.GroupBy(x => x.CreatedOn.Year)
                           .Select(g => g.FirstOrDefault())
                           .ToList()
                           .OrderByDescending(x => x.CreatedOn.Year);
mjwills
  • 23,389
  • 6
  • 40
  • 63
Barr J
  • 10,636
  • 1
  • 28
  • 46
  • @BarrJ thank you, do you suggest converting IQueryList to List before applying the GroupBy, like this .... var YearsGroup = IQueryList.ToList().GroupBy(x => x.CreatedOn.Year).... – Hisham Aburass Oct 18 '18 at 08:11
  • I'd prefer this https://stackoverflow.com/questions/1578778/using-iqueryable-with-linq/1578809#1578809 :) – Barr J Oct 18 '18 at 08:15
  • @BarrJ converting it to a List before applying the GroupBy solved my problem, that's really strange, can you give more information about this issue ? thank you ! – Hisham Aburass Oct 18 '18 at 08:28
  • Read the article I posted in my answer, all is there :) – Barr J Oct 18 '18 at 08:29