5

I have a parent entity with a navigation property to a child entity. The parent entity may not be removed as long as there are associated records in the child entity. The child entity can contain hundreds of thousands of records.

I'm wondering what will be the most efficient to do in Entity Framework to do this:

var parentRecord = _context.Where(x => x.Id == request.Id)
                           .Include(x => x.ChildTable)
                           .FirstOrDefault();

// check if parentRecord exists

if (parentRecord.ChildTable.Any()) {
  // cannot remove
}

or

var parentRecord = _context.Where(x => x.Id == request.Id)
                            .Select(x => new {
                                  ParentRecord = x,
                                  HasChildRecords = x.ChildTable.Any()
                            })
                           .FirstOrDefault();

// check if parentRecord exists

if (parentRecord.HasChildRecords) {
  // cannot remove
}

The first query may include thousands of records while the second query will not, however, the second one is more complex.

Which is the best way to do this?

Bv202
  • 3,924
  • 13
  • 46
  • 80
  • 1
    That's not what `Include()` is for. It tells EF that you want to pull child records for processing. If you *don't* child entities, don't use Include – Panagiotis Kanavos Feb 01 '18 at 12:43
  • The queries aren't the same anyway. The first load all child entities for the first parent. The second only returns a flag if there are any child entities – Panagiotis Kanavos Feb 01 '18 at 12:44
  • I don't need the child entities for processing, but I need to check if there is at least one of them. So in this case, the second query would be preferrable? – Bv202 Feb 01 '18 at 12:49
  • That's not what the first query asks. It asks for all children of the first result. That's the reason you should use the second query, not `Include()`. I'd bet that if you added `Include()` to the second query EF wouldn't change the query at all. it already checks for child entities but doesn't need to return them – Panagiotis Kanavos Feb 01 '18 at 12:53
  • I'd think the second query is far more performant, simply because you don't materialize that much data. You simply pass the processing work to the DBMS, which should be faster depending on the DBMS and where the DB is. – DevilSuichiro Feb 01 '18 at 12:57
  • @DevilSuichiro the second query is more performant because it asks the write question. The first asks a completely different question. It has nothing to do with the database – Panagiotis Kanavos Feb 01 '18 at 12:58
  • 2
    if talking about hundreds of thousands of entries, it is better not to use the EF at all – aleha_84 Feb 01 '18 at 13:06
  • 1
    @PanagiotisKanavos not necessarily. EF is not smart enough to know that the second query would be a simple join. Depending on DBMS materializing data may be more performant than the building of the (possibly huge) temporary table on DBMS side. – DevilSuichiro Feb 01 '18 at 13:08
  • @PanagiotisKanavos Please post a wrong answer so I can downvote it instead of using comments – Joe Phillips Feb 01 '18 at 14:11

2 Answers2

0

I would say it depens. It depends on which DBMS you're using. it depends on how good the optimizer works etc. So one single statement with a JOIN could be far faster than a lot of SELECT statements.

In general I would say when you need the rows from your Child table use .Include(). Otherwise don't include them. Or in simple words, just read the data you need.

user743414
  • 936
  • 10
  • 23
  • If EF didn't already perform a JOIN, the second query wouldn't be able whether there are any child entities. `Include` **includes** child entities in the results. It doesn't perform joins – Panagiotis Kanavos Feb 01 '18 at 12:57
  • @PanagiotisKanavos So you're telling me that including results from one table and another into one result set isn't a join right? – user743414 Feb 01 '18 at 12:58
  • Unrelated. Include has nothing to do with JOINs. The second query also performs JOINs – Panagiotis Kanavos Feb 01 '18 at 12:59
  • 1
    @PanagiotisKanavos `Include()` has nothing to do with JOINS? It creates the JOIN. The second query creates one or more subselects, which in case depends on the used DBMS how it is executed. An executing more statements in general gives worse performance. – user743414 Feb 01 '18 at 13:04
  • 1
    @PanagiotisKanavos `Include()` does a left outer join as noted here https://stackoverflow.com/a/4299667/4903560, I'm afraid you are incorrect. – Joe Feb 01 '18 at 13:34
0

The answer depends on your database design. Which columns are indexed? How much data is in table?

Include() offloads work to your C# layer, but means a more simple query. It's probably the better choice here but you should consider extracting the SQL that is generated by entity framework and running each through an optimisation check.

You can output the sql generated by entity framework to your visual studio console as note here.

This example might create a better sql query that suites your needs.

Joe
  • 1,847
  • 2
  • 17
  • 26
  • No it doesn't. `Include` *includes* child entities in the results. There's no such thing as "offloading work to C#". Pulling rows to the client for filtering without indexes, instead of using the database causes sever performance problems. – Panagiotis Kanavos Feb 01 '18 at 12:51
  • BTW the linked answer is unrelated to this question – Panagiotis Kanavos Feb 01 '18 at 12:55
  • 1
    `Incluce` in this instance pulls the rows to the client, which means the client then has to perform the filter itself. Which is what I meant for offloading work to the C# layer. And I think the linked answer does show how using `Any` within a `Select` in entity framework produces poor performing SQL which can be optimised as demonstrated. Which is of value to this question asking about performance related to using `Any` within a `Select` in entity framework. – Joe Feb 01 '18 at 13:30