7

I am trying to GroupJoin some data with an IQueryable and project that data into an anonymous type. The original entity that I am GroupJoining onto has an ICollection navigation property (ie. one:many). I want to eager load that property so I can access it after the group join without EF going back to the DB. I know that Include() doesn't work when you use a GroupJoin, but the following code is the only way I have found to make it eager load the collection (ContactRoomRoles):

using (var context = new MyDbContext()) {
    var foundRooms = context.Rooms.Include(rm => rm.ContactRoomRoles);
    foundRooms.ToList();  // <-- Required to make EF actually load ContactRoomRoles data!

    var roomsData = foundRooms
        .GroupJoin(
            context.Contacts,
            rm => rm.CreatedBy,
            cont => cont.Id,
            (rm, createdBy) => new {
                ContactRoomRoles = rm.ContactRoomRoles,
                Room = rm,
                CreatedBy = createdBy.FirstOrDefault()
            }
        )
        .ToList();

    var numberOfRoles1 = roomsData.ElementAt(1).Room.ContactRoomRoles.Count();
    var numberOfRoles2 = roomsData.ElementAt(2).Room.ContactRoomRoles.Count();
    var numberOfRoles3 = roomsData.ElementAt(3).Room.ContactRoomRoles.Count();
}

If I remove the foundRooms.ToList(), EF goes off to the database 3 times to populate my numberOfRoles variables at the end, but with foundRooms.ToList() it doesn't - it just eager loads the data in one query upfront.

Although this works, it feels like a total hack. I'm just calling .ToList() for the side-effect of making EF actually load the collection data. If I comment that line out, it goes to the database any time I try to access ContactRoomRoles. Is there a less hacky way to make EF eager load that navigation property?

NOTE: I want to use the navigation property rather than projecting it into a new property of the anonymous type because AutoMapper wants to access Room.ContactRoomRoles when it's mapping onto a DTO object.

Jez
  • 27,951
  • 32
  • 136
  • 233
  • I think it’s kind of funny that you’d use an ORM and at the same time complain that the code seems hacky. You should put a trace on the database and have a look at the SQL code it generates if you want to see some ugly code! – theMayer Oct 15 '18 at 11:15
  • 1
    If you're doing this only for the sake of arriving at an AutoMapper generated DTO graph, why don't you have AutoMapper do the projection for you as part of the query using the queryable extensions feature? http://docs.automapper.org/en/stable/Queryable-Extensions.html – Jonas Høgh Oct 15 '18 at 13:13
  • 1
    The non hacky way is to define and use proper navigation properties instead of `GroupJoin`. Any manual join in EF is indicator of improper entity model. In your case, the `ContactRoomRole` entity should have navigation property to `Contact` representing the `CreatedBy` FK. – Ivan Stoev Oct 15 '18 at 19:39
  • @IvanStoev The actual query is more complex and requires group join as I have a subquery I'm joining. This example is just simplified. – Jez Oct 15 '18 at 21:01
  • 1
    Then may be provide more realistic example. The rules for eager loading are clear - see https://stackoverflow.com/questions/52767559/does-the-order-of-includes-when-filtering-have-an-impact-on-performance/52768226#52768226. What are you using in your current sample is called tracking and navigation property fixup during the materialization of a query returning entities in memory. – Ivan Stoev Oct 15 '18 at 21:12
  • Normally, projecting an entity's collection along with the entity itself in an anonymous type will wire up the entity's nav property appropriately, so I'm not sure what you're doing that prevents it from happening and I agree with @IvanStoev, the actual code in use is required. Did you disable tracking? – Moho Oct 15 '18 at 23:15

3 Answers3

5

This is not a hack. This is an abstraction leak. We should be ready to meet abstraction leaks using ORM tools (and any other internal DSL).

After ToList() you not only execute actual sql call (and load data into memory) but also cross to other Linq flavor - "Linq for objects". After this all your calls of Count() doesn't generate sql just because you start working with in memory collections (not with expression trees those are hidden by IQueryable - the return type of GroupBy statement, but with List collection - return type of ToList).

Without ToList() you stay with "Linq for sql" and EF will translate each call of Count() on IQuerybale to sql; Three Conut() call = three underlined Sql statements.

There are no way to avoid this, otherwise then to calculate all count(*) values on server side in one complex query. If you will try to write such query with Linq (constructing expression tree) - you will meet abstraction leak again. ORM tool is designed to map objects to "RDBS entities" staying with CRUD (Create Read Update Delete) operations - if statement become more complex - you will be not able to foresee generated sql (and all runtime exceptions like 'can't generate sql for such linq'). So do not use linq for complex 'report like' queries (in some cases you could - it depends on your re-usage requirements and testing possibilities). Use old good SQL and call it through ADO or EF ADO "sql extensions" like EF Core FromSql:

var blogs = context.Blogs
    .FromSql("EXECUTE dbo.GetMostPopularBlogsForUser {0}", user)
    .ToList();

Update: it is a good recommendation also to avoid using lazy loading and manual entities loading if you are not working on reusable EF tools. They are in some sense opposite to linq queries - expression trees. They were important (if not only one) option to achieve referenced entities loading on "old" platforms where were no "expression trees" in language but in .NET/EF where full queries can be written "declarative way" as expression trees without execution (but with postponed interpretation) there should be very strong reason to return back to "manual" loading.

Roman Pokrovskij
  • 9,449
  • 21
  • 87
  • 142
  • `List` can be converted to `IQueryable` as well by `.AsQueryable()` extension method. The matter is one should understand which implementation (EF query provider or a list wrapper) of `IQueryable` is used in each LINQ call. Also calls over EF `IQueryable` like `.ToList()`, which actually execute SQL, are called materialization in docs. – stop-cran Oct 15 '18 at 13:39
  • 1
    Isn't CRUD (Create Read Update Delete)? – mxmissile Oct 15 '18 at 16:14
3

It's all about collections that are marked as loaded, or not.

The line

foundRooms.ToList();

(or foundRooms.Load())

loads all Rooms and their ContactRoomRoles collections into the context. Since the Include statement is used, these collections are marked as loaded by EF. You can check that by looking at

context.Entry(Rooms.Local.First()).Collection(r => r.ContactRoomRoles).IsLoaded

which should return true.

If you omit the line foundRooms.ToList();, each time a Room.ContactRoomRoles collection is accessed, EF will notice it's not marked as loaded yet and will lazy-load it. After that, the collection is marked as loaded, but it took an extra query.

A collection is only marked as loaded when it is -

  • Include-ed
  • loaded by lazy loading
  • loaded by the Load() statement, as in

    context.Entry(Rooms.Local.First()).Collection(r => r.ContactRoomRoles).Load();
    

Not when it is part of a projection into another property (like the part ContactRoomRoles = rm.ContactRoomRole in your query).

However, after the statement var roomsData = foundRooms (...).ToList() all Room.ContactRoomRoles are populated, because the query did load them into the context, and EF's always executes the relationship fixup process, which auto-populates navigation properties.

So, to summarize, after your query you have roomsData containing room objects with ContactRoomRoles collections that are populated but not marked as loaded.

Knowing this, it's apparent now that the only thing to do is: prevent lazy loading to occur.

The best way to achieve that is to prevent EF from creating entity objects that are capable of lazy loading, aka proxies. You do that by adding the line

context.Configuration.ProxyCreationEnabled = false;

just below the using statement.

Now you'll notice that the line

var numberOfRoles1 = roomsData.ElementAt(1).Room.ContactRoomRoles.Count();

doesn't trigger an extra query, but does return the correct count.

Gert Arnold
  • 105,341
  • 31
  • 202
  • 291
  • You say "Knowing this, it's apparent now that the only thing to do is: prevent lazy loading to occur." - however, can't I just use the `.Load()` statement to make EF load the `ContactRoomRoles` collections? Why do I also need to prevent lazy loading? – Jez Oct 15 '18 at 21:17
  • 1
    You'd have to `Load()` all collections for each `room` individually. – Gert Arnold Oct 15 '18 at 21:19
  • 1
    This is the correct answer. @Jez, take a look at [Applying filters when explicitly loading related entities](https://learn.microsoft.com/en-us/ef/ef6/querying/related-data#applying-filters-when-explicitly-loading-related-entities) documentation section. Ignore the filter part: *When using the Query method it is usually best to turn off lazy loading for the navigation property. This is because otherwise the entire collection may get loaded automatically by the lazy loading mechanism either before or after the filtered query has been executed.* – Ivan Stoev Oct 17 '18 at 06:01
1

This is called an Abstraction Leak and it means your abstraction exposes some implementation details.

This is happening when you call the .ToList() and you switch (I don't like the word cross) between Linq to sql and Linq to objects.

I'd recommend you to read The Law of Leaky Abstractions to get the grasp better, as it is quite complicated to explain on one foot.

The main idea behind it is, that everything will work as planned but slower then usual when you attempt to provide a complete abstraction of an underlying unreliable layer, but sometimes, the layer leaks through the abstraction and you feel the things that the abstraction can’t quite protect you from.


Edit to clarify:

calling ToList() forces linq-to-entities to evaluate and return the results as a list.

Meaning that, for example from the answer above:

var blogs = context.Blogs
    .FromSql("EXECUTE dbo.GetMostPopularBlogsForUser {0}", user)
    .ToList();

Will be evaluation to the corresponding model of the context - blogs model.

So in other words, it is being lazily executed at the moment that you call ToList().

Prior to the ToList() call, C# does NO SQL calls. So actually, it is NOT an in-memory operation.

So yes, it is putting that data into memory as part of the context and reads it in the same context.

Barr J
  • 10,636
  • 1
  • 28
  • 46
  • This is a good start to an answer, but I'd like more detail on precisely *why* EF is doing this so I can better understand what's going on. When I call `.ToList()` is it putting that data into memory as part of the context or something? Does it then read that data later on in the same context? It seems to me that if it does this they would surely implement a proper mechanism to achieve this behaviour. – Jez Oct 15 '18 at 10:37
  • Edited with explanation :) – Barr J Oct 15 '18 at 10:57