How can I optimize my EF Code First query?

Question

[Updated - see update at bottom]

I am using EF code-first, and am generally happy with it. However, one simple (and common) operation is causing EF to generate ridiculously complex SQL, which is slowing my application down.

I am simply fetching a set of entities using a list of (integer) IDs, but because I need details of lots of sub-entities, I'm using .Include() to get those sub-entities loaded at the same time, as follows:

db.MyEntities
    .Where(x => x.ClientId == clientId)
    .Where(x => ids.Contains(x.Id))
    .Where(x => x.SubEntity1 != null)
    .Include(x => x.SubEntity1)
    .Include(x => x.SubEntity1.SubSubEntity1)
    .Include(x => x.SubEntity1.SubSubEntity2)
    .Include(x => x.SubEntity1.SubSubEntity3)
    .Include(x => x.SubEntity1.SubSubEntity4)
    .Include(x => x.SubEntity2)
    .Include(x => x.SubEntity2.SubSubEntity1)
    .Include(x => x.SubEntity2.SubSubEntity2)
    .Include(x => x.SubEntity2.SubSubEntity3)
    .Include(x => x.SubEntity2.SubSubEntity4)
    .Include(x => x.SubEntity3)

As you can see, it's not a particularly complex query, with the exception of all those Includes.

The SQL that EF generates for this is huge - around 74Kb of SQL. It doesn't take very long to execute (since normally the number of items in the list of IDs is small), but it takes EF more than a second just to construct the query - i.e. before the query is even sent to the database.

If I remove the Includes, then the query is much smaller, and the whole thing takes much less time - but the various related entities are then loaded one-at-a-time, which doesn't scale well.

EF seems to give me two options for loading the data:

Load all the sub-entities at once during the initial query (using Include as above), or
Load the sub-entities one-at-a-time (using lazy loading, or explicitly using Load/LoadProperty).

Option 1 would be my preferred option if it worked, but since that doesn't work in this case, my only remaining option is 2 - and I don't think that's acceptable: there would be too many database queries where the input list of IDs (i.e. the number of entities) is large.

It seems to me that there is another option that EF doesn't seem to address: having fetched the main entities, fetch all the relevant SubEntity1 entities, then all the relevant SubEntity2 entities, etc. That way, the number of queries is related to the number of types of entity to be fetched, rather the number of entities. This would scale much better.

I can't see a way to do that in EF: in other words, to say "load this property for all these entities (in a single query)".

Will I just have to give up on EF and write my own SQL?

UPDATE I have noticed that even if I remove the Includes, the SQL generated is more complex than I think it should be, and I think this all stems from the fact that EF does not 'like' my table structure. I struggled for days to get EF to create the database structure I was looking for via Code First (and the Fluent API), and even when I had got to (nearly) where I wanted to be, I had to accept some compromises.

I think I'm now paying a further penalty for daring to do something that EF didn't want me to do. It looks like a simple query is more complex than it should be, and a slightly-more-complex query is massively more complex.

This is incredibly depressing - I thought I'd left all those EF hassles behind, and the system is now in production with dozens of users - which would make it very difficult for me to start over.

It seems I'm going to have to spend the eternity fighting EF tooth and nail at every turn. How I wish I'd never used it in the first place!

Anyway, back to my original question: if I have a bunch of entities of type A for which I want to load the related sub-entities of type B in one query, is there a way to do that?

Why is the query that huge? Any unexpected constructs? Or is it just that the list of fields in the SELECT is a mile long? — Gert Arnold, Aug 08 '13 at 14:46
@GertArnold: Good question, and yes there are a lot of fields in the SELECT. And of course there are a lot of JOINs to the various sub-entity tables. However, there seem to be an awful lot of nested statements, and repeated JOIN-ing of the same tables. Far more than I would need if I constructed the query myself. — Gary McGill, Aug 08 '13 at 14:51
Does your application target .NET 4.5? Or is .NET 4.5 at least installed on the machine where the app is running? In that case you should notice the more than 1 second delay only for the first run of the query because EF 5 caches translated LINQ queries with .NET 4.5 after they have been transformed to SQL once. — Slauma, Aug 08 '13 at 18:24
@Slauma: I have 4.5, but the query does not appear to be cached (or rather, the cached query is not hit). Looking at the SQL, it has some parameters in it but also some hard-coded IDs, so I think that as the IDs change the query will also change. — Gary McGill, Aug 08 '13 at 19:23
I see. Could you replace the hardcoded values in the LINQ query by local variables? They should get translated into sql parameters. — Slauma, Aug 08 '13 at 19:40
@Slauma: It's not under my control; the SQL is created by EF. It seems to turn single values into SQL parameters, but because I have a list of IDs it's turning that into a SQL IN(x,y,z) where the x,y and z values are in the SQL rather than being parameterized. I guess this is because SQL has no "list" parameter type. — Gary McGill, Aug 09 '13 at 09:43
SQL IN is a bad idea: http://stackoverflow.com/questions/4722220/sql-in-clause-1000-item-limit — Mariusz.W, Aug 09 '13 at 10:59
@Mariusz.W: To be clear: it's not me that's producing the "IN". EF is doing it in response to me using Ids.Contains(...). — Gary McGill, Aug 09 '13 at 12:18
@Gary, yeah I got that. Just saying that at the very start we know the generated query won't be the next Usain Bolt ;) — Mariusz.W, Aug 09 '13 at 12:48

score 3 · Accepted Answer · answered Aug 09 '13 at 08:39

3

How about loading the data using stored procedures? Yes, it is a bit dirty, but this is what I do when I hit performance issues with EF. I hope I'm not missing something in your question.

http://msdn.microsoft.com/en-US/data/jj691402

answered Aug 09 '13 at 08:39

Mariusz.W

1,347
12
19

Thanks. I could use an SP (or just some SQL) and it looks like this Translate method is a good way for me to get the data back into the EF proxies. What worries me is the bit that says " if you have lazy loading enabled, accessing the posts property on one of the blog entities then EF will connect to the database to lazily load all posts, even though we have already loaded them all. This is because EF cannot know whether or not you have loaded all posts or if there are more in the database. If you want to avoid this then you will need to disable lazy loading". Yikes :-( – Gary McGill Aug 09 '13 at 10:41
1

Disable lazy loading then while you work on the object(s). If you loaded everything you need, you won't notice the difference. If you forgot about loading something - you will get null pointer exception or some logical error. This means you will have to either eager load it or explicitly load when you need it. EF makes the access to data easy to program but at the cost of performance. If you want performance you will have to "work harder". No offence intended lol :) From my experience 10 includes is a no go with EF. – Mariusz.W Aug 09 '13 at 10:55
1

Thanks, this is what I ended up doing. I removed the `virtual` modifier on the collections so that they are not lazy-loaded, to prevent EF re-loading data I've already retrieved via my SP. – Gary McGill Aug 21 '13 at 11:16

How can I optimize my EF Code First query?

1 Answers1

Linked