2

I have entity Post that has one-to-many relations with Author and Comment. I would like to load all Posts and joined them with the first Author and all the Comments. The code with Include would look like this:

Post[] posts = ctx.Posts.Include(p => p.Authors.Take(1)).Include(p => p.Comments).ToArray();

There is a cartesian explosion issue with this query. If Post owns n Comments, Author and Comment are going to be repeated n times in the result set.

Solution #1

In EF Core 5.0, I could use a Split Query but then that would generate 3 queries when I would like to load Post with Author first then all Comments.

Solution #2

First, load Post with Author then iterate on the post to explicitly load their comments but that would generate n + 1 queries.

Post[] posts = ctx.Posts.Include(p => p.Authors.Take(1)).ToArray();
foreach (Post post in posts)
  ctx.Entry(post).Collection(p => p.Comments).Load();

Solution #3

First, load Post with Author then gather all post ids to generate a single query to load comments.

Dictionary<int, Post> postsById = ctx.Posts.Include(p => p.Authors.Take(1)).ToDictionnary(p => p.Id);
Comment[] comments = ctx.Comments.Where(c => postsById.ContainsKey(c.PostId)).ToArray();
foreach (Comment comment in comments)
  postsById[comment.PostId].Comments.Add(comment); // How to avoid re-adding comment?

This solution would generate only 2 queries without any duplicated data but how can I avoid the comments to be added again to the post? Is there a better way than the 3 proposed solutions?

Greg
  • 792
  • 2
  • 9
  • 24
  • EF knows how to deal with the result set. You always get unique posts. Seems like you're trying to solve a non-existing problem. – Gert Arnold Dec 11 '20 at 14:48
  • I had a similar problem - the cross joins from `Include` will eventually cause enormous result sets from the SQL query as the number of related records increases. I ultimately switched to ADO for these types of queries - this let me split the queries manually and also run them all concurrently (since EF is not thread-safe). The performance gains were night-and-day. – crgolden Dec 11 '20 at 14:51
  • @GertArnold my issue here is not correctness but performance. I want one query for both post and author and a second query for the comments. I noticed that my example was oversimplified. For one-to-one relationships, EF core knows that it doesn't have to create a new query but can simply use a JOIN. I edited my example to reflect my problem. Post has now a one-to-many relation with Author and I want posts with their first author and all their comments. – Greg Dec 11 '20 at 15:59
  • @crgolden I would like to avoid writing any SQL because I use an in-memory provider for development environment. – Greg Dec 11 '20 at 16:00
  • Yes, I get that now, but it wasn't apparent from your first description which seems to focus on data replication (which of course hits performance, esp. with repeated long strings). – Gert Arnold Dec 11 '20 at 16:01
  • Including single item from one-to-many relationship doesn't seem to be a common case (I would rather say it should be quite rare), so optimizing for that case doesn't seem to worth efforts. I think standard split to 3 queries should be just fine. – Ivan Stoev Dec 11 '20 at 18:34
  • @IvanStoev Right but for my culture, would you know how to make the third solution work? – Greg Dec 11 '20 at 18:38

2 Answers2

3

I would add another option. Since I'm the author of Eager Loading in linq2db. I'm pretty sure that it will run only two queries.

So just install this extension linq2db.EntityFrameworkCore (3.x version for EF Core 3.1.x and 5.x version for EF Core 5.x)

And try this query:

Post[] posts = ctx.Posts
   .Include(p => p.Author)
   .Include(p => p.Comments)
   .ToLinqToDB()
   .ToArray();

Also this approach should work with AutoMapper's ProjectTo and fully custom projection. I know that custom projection is not working with AsSplitQuery (since I tried it)

Svyatoslav Danyliv
  • 21,911
  • 3
  • 16
  • 32
  • Looks like all your answers to EFC related issues are to use Linq2Db bridge :-) Which I should admit could the the "right" thing. – Ivan Stoev Dec 11 '20 at 15:13
  • Why not. People move to the Dapper because EF Core won't to support something or doing that like a newbie. – Svyatoslav Danyliv Dec 11 '20 at 15:15
  • That's what I'm saying. It's not normal for me to rely on 3rd party packages for providing basic functionality, but that's the reality with EF Core (unfortunately) even after several years of development. The only problem is the need of explicit `ToLinqToDb()` call, it could have been much easier for us (and for them as well) if they exposed an easy way of replacing the provider, but... – Ivan Stoev Dec 11 '20 at 15:18
  • 1
    Even worse, occasionally I'm monitoring their issues. It is terrible 20 seconds for translating LINQ query with 5-6 includes. Good news here that they do not harm Database Server when doing that ;) – Svyatoslav Danyliv Dec 11 '20 at 15:22
  • Indeed :-( Regarding one the other issues you've answered today, instead of NeinLinq and LinqKit you may find interesting DelegateDecompiler approach, as well as my attempt with plugging query preprocessor [here](https://stackoverflow.com/questions/62115690/ef-core-queries-all-columns-in-sql-when-mapping-to-object-in-select/62138200#62138200) and my failed attempts to convince EF team [here](https://github.com/dotnet/efcore/issues/19748). At least the issue is still "open" :-) – Ivan Stoev Dec 11 '20 at 15:29
  • 1
    @IvanStoev, fight with them ;) I'm really cannot understand why it is forbidden. It is fu..n easy and saves lives (weeks). Usually our extension users say exact words: "your extension is lifesaver" and I'm proud of that ;) – Svyatoslav Danyliv Dec 11 '20 at 15:44
  • 2
    @GertArnold, got it. I'm new in SO, some nuances are still not intuitive. – Svyatoslav Danyliv Dec 11 '20 at 15:50
  • If you are in any way or form affiliated with the plugin/website/blog/product/project etc that you have linked then please mention your affiliation clearly in the answer itself. Undisclosed affiliation will be considered spam and deleted. Please read [How to not be a spammer](https://stackoverflow.com/help/promotion) and [Can I support my product on this site?](https://stackoverflow.com/help/product-support). – Sabito stands with Ukraine Dec 11 '20 at 15:50
  • It is good to read through those links... Also please edit all answers that come under this criteria.. – Sabito stands with Ukraine Dec 11 '20 at 15:52
  • @Svyatoslav For this particular issue though, try OP original query (having `.Include(p => p.Authors.Take(1))`) with your approach. I see database 3 queries and get exception at the end of the last query execution (raised `at LinqToDB.Common.ConvertBuilder.ConvertDefault(Object value, Type conversionType)`) – Ivan Stoev Dec 11 '20 at 19:09
  • @IvanStoev, for every collection there will be one query. According to your exception, could you please create issue in github? – Svyatoslav Danyliv Dec 12 '20 at 07:04
  • I used linq2db.EntityFrameworkCore to deal with cartesian explosion on a query I had with multiple joins. Was easy to set up and was a massive improvement in the query run execution time – user2590928 Aug 10 '21 at 00:25
1

I have found a way for the solution #2 to work with only two queries here: https://github.com/dotnet/efcore/issues/7350.

int postIds = new[] { 3, 4 };
Post[] posts = ctx.Posts
    .Include(p => p.Authors.Take(1))
    .Where(p => postIds.Contains(p.Id))
    .ToArray();

// This line automatically populates posts comments in the same DbContext.
ctx.Comments
    .Where(c => postIds.Contains(c.PostId))
    .Load();
Greg
  • 792
  • 2
  • 9
  • 24
  • Note that it only works if tracking is enabled; if you disable it (with `AsNoTracking()` on the query or at the DbContext level), EF Core won't fix up the navigation properties. – Thomas Levesque Aug 17 '21 at 15:46