0

I'm having problems with Includes when pulling data out of the entity framework. I posted a very similar question yesterday here, which has the same sample I'm about to use. I'm using Entity Framework v4.0.

I have the following simple model, a table containing list of Page Forms (~200). Each form has one or more Fields (~4000 total), and each field has may have some Parameters (~16000 total).

Entity model

I use the following code to pull data out:

EntityConnection myConnection = new EntityConnection("name=myModel");

if(conn.State != ConnectionState.Open) {
    conn.Open();
}
ObjectContext context = new ObjectContext("name=myModel");
context.ContextOptions.LazyLoadingEnabled = false;

ObjectQuery<PageForm> myObjectSet = context.CreateObjectSet<PageForm>()
                                           .Include("FormFields.FormFieldParameters");

IQueryable<PageForm> myFilteredObjectSet = myObjectSet.Where(c => c.FormID == 1);

List<PageForm> myReturnValue = myFilteredObjectSet.toList();

Which generates the following sql query

SELECT 
[Project1].[FormID] AS [FormID], 
[Project1].[FormName] AS [FormName], 
[Project1].[C2] AS [C1], 
[Project1].[FormID1] AS [FormID1], 
[Project1].[FieldID] AS [FieldID], 
[Project1].[FieldName] AS [FieldName], 
[Project1].[C1] AS [C2], 
[Project1].[FieldParamID] AS [FieldParamID], 
[Project1].[Value] AS [Value], 
[Project1].[FieldID1] AS [FieldID1]
FROM ( 
    SELECT 
    [Extent1].[FormID] AS [FormID], 
    [Extent1].[FormName] AS [FormName], 
    [Join1].[FieldID] AS [FieldID], 
    [Join1].[FieldName] AS [FieldName], 
    [Join1].[FormID] AS [FormID1], 
    [Join1].[FieldParamID] AS [FieldParamID], 
    [Join1].[Value] AS [Value], 
    [Join1].[FieldID1] AS [FieldID1], 
    CASE WHEN ([Join1].[FieldID] IS NULL) THEN CAST(NULL AS int) WHEN ([Join1].[FieldParamID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1], 
    CASE WHEN ([Join1].[FieldID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
    FROM  [dbo].[PageForm] AS [Extent1]
    LEFT OUTER JOIN (
        SELECT 
        [Extent2].[FieldID] AS [FieldID], 
        [Extent2].[FieldName] AS [FieldName], 
        [Extent2].[FormID] AS [FormID], 
        [Extent3].[FieldParamID] AS [FieldParamID], 
        [Extent3].[Value] AS [Value], 
        [Extent3].[FieldID] AS [FieldID1]
        FROM  [dbo].[FormField] AS [Extent2]
        LEFT OUTER JOIN [dbo].[FormFieldParameter] AS [Extent3] ON [Extent2].[FieldID] = [Extent3].[FieldID] 
    ) AS [Join1] ON [Extent1].[FormID] = [Join1].[FormID]
    WHERE 1 = [Extent1].[FormID]
) AS [Project1]
ORDER BY [Project1].[FormID] ASC, [Project1].[C2] ASC, [Project1].[FieldID] ASC, [Project1].[C1] ASC

Now I'm interested in this part of the query:

LEFT OUTER JOIN (
    SELECT 
    /**/
    FROM  [dbo].[FormField] AS [Extent2]
    LEFT OUTER JOIN [dbo].[FormFieldParameter] AS [Extent3] ON [Extent2].[FieldID] = [Extent3].[FieldID] 
) AS [Join1] ON [Extent1].[FormID] = [Join1].[FormID]

This seems to query the entire FormField and FormFieldParameter tables, with no filtering applied. So, what I think is happening, is that using .Include("FormField.FormFieldParameter") is akin to saying - "And return all data from these tables in the resulting query". What I really want is "Return only the data from these tables which relates to the filtered PageForm table".

Is there a way to do this? Sorry if this question sounds too simple, or similar to my previous question, but I'm really having trouble getting my head around the internals of Entity Framework.

EDIT 1:

If I change the above sample to the following which includes a where clause, then the query is orders of magnitude faster (roughly 10x faster).

LEFT OUTER JOIN (
    SELECT 
    /**/
    FROM  [dbo].[FormField] AS [Extent2]
    LEFT OUTER JOIN [dbo].[FormFieldParameter] AS [Extent3] ON [Extent2].[FieldID] = [Extent3].[FieldID] 
    WHERE 1 = [Extent2].[FormID]
) AS [Join1] ON [Extent1].[FormID] = [Join1].[FormID]

EDIT 2:

Some more information. I've figured out that If i add some funky filtering to the initial query, I can force a much more efficient query to be ran.

IQueryable<PageForm> myFilteredObjectSet = myObjectSet
                                          .Where(c => c.FormID == 1)
                                          .Where(a => a.FormFields
                                                       .Where(c => c.FormFieldParameters
                                                                    .Any(d => d.FieldID == c.FieldID))
                                                       .Any(b => b.FormID == 1)
                                           );

This causes the query to filter correctly, and runs much faster. However, I'm certain this isn't the best way go about this, as the nested Where/Any statements very quickly become a nightmare to deal with when you have more than a handful of Includes. There must be a better way?

Community
  • 1
  • 1
John
  • 1,502
  • 2
  • 13
  • 40
  • That workaround doesn't eager load your relations. Even if you add Include it will still not filter includes because that [feature is not supported](http://data.uservoice.com/forums/72025-entity-framework-feature-suggestions/suggestions/1015345-allow-filtering-for-include-extension-method). – Ladislav Mrnka Jul 31 '12 at 14:43
  • It appears to be eagerly loading everything ok for me, and the sql profiler I have running only shows a single call being made - which is when I call `.ToList()` on the Queryable object. – John Jul 31 '12 at 15:14

1 Answers1

0

The filtering is applied - your query contains WHERE 1 = [Extent1].[FormID]. It really loads only date you have requested. The query engine in the database will evaluate your query correctly and it will use some optimization techniques to filter records efficiently.

Ladislav Mrnka
  • 360,892
  • 59
  • 660
  • 670
  • I'm sorry, but I'm sure this is not correct. If I manually update the query to include an inner where clause, the query runs much faster. I'll update my question to show this. – John Jul 31 '12 at 13:18
  • Well, it now becomes completely different question because your former question was about returning wrong data and new question is about different performance. Yes EF queries may have much worse performance but the only workaround is not using Linq and EF and writing your own SQL queries. – Ladislav Mrnka Jul 31 '12 at 13:55
  • I think you misunderstand - the data that I'm getting back in exactly as expected, in both situations. The issue is that the query generated is selecting from the entire 'FormField' table, when it should only be selected from the table where `FormID == 1`. I know there is an overhead in using EF, but this is such a large performance hit, that I am sure I must be doing something wrong. – John Jul 31 '12 at 14:21
  • See my question for a workaround, which works, but is ugly as sin. I'm sure there must be a better way of doing this. – John Jul 31 '12 at 14:40
  • Unfortunately you are not doing anything wrong. You can try to use .NET 4.5 - there can be some performance improvements, report it to [MS Connect](http://connect.microsoft.com/) as a bug, report it to [Data UserVoice](http://data.uservoice.com/forums/72025-ado-net-entity-framework-ef-feature-suggestions) as request for improvement or open design discussion on [EF CodePlex](http://entityframework.codeplex.com/). You can also download EF source code from codeplex and try to fix it yourselves. – Ladislav Mrnka Jul 31 '12 at 14:42