1

I've got a fairly generic CRUD webapp, which generates pages dynamically according to the contents of several database tables. I'm using Entity Framework 4.0 to pull this data out of the DB, however I'm running into severe performance problems. I've managed to iterate down into a problem which is contained enough that I can detail below.

I have a table containing list of Page Forms (~200). Each form has one or more Fields (~4000 total), and each field has may have some Parameters (~16000 total).

I've attached a screenshot of my model below:

Entity model

The associated entity objects are as follows:

public class Form
{
    public int FormID { get; set; }
    public string FormName { get; set; }

    public IList<FormField> FormFields { get; set; }

}

public class FormField
{
    public int FieldID { get; set; }
    public string FieldName { get; set; }
    public int FormID{ get; set; } 

    public IList<FormFieldParameter> FormFieldParameters { get; set; }
    public Form ParentForm { get; set; }

}

public class FormFieldParameter
{
    public int FieldParamID{ get; set; }
    public string Value{ get; set; }
    public int? FieldID { get; set; }

    public FormField ParentField { get; set; }
}

The following code pulls out all data for the Form which has an ID of '1'.

EntityConnection myConnection = new EntityConnection("name=myModel");

if(conn.State != ConnectionState.Open) {
    conn.Open();
}
ObjectContext context = new ObjectContext("name=myModel");
context.ContextOptions.LazyLoadingEnabled = false;

ObjectQuery<PageForm> myObjectSet = context.CreateObjectSet<PageForm>()
                                           .Include("FormField.FormFieldParameter");

//Edit: I missed this part out, sorry. In hindsight, this was exactly what was
//causing the issue.
IEnumerable<PageForm> myObjectSetEnumerable = myObjectSet.AsEnumerable();
IQueryable<PageForm> myFilteredObjectSet = myObjectSetEnumerable.Where(c => c.FormID == 1)
                                                                .AsQueryable();


List<PageForm> myReturnValue = myFilteredObjectSet.toList();

Now, while this does work, it runs really poorly. The query takes over a second to run, the entirety of which is spent in the myFilteredObjectSet.toList() call. I ran a profiler on my database to see what was causing the delay, and found that the following query was being generated:

SELECT 
[Project1].[FormID] AS [FormID], 
[Project1].[FormName] AS [FormName], 
[Project1].[C2] AS [C1], 
[Project1].[FormID1] AS [FormID1], 
[Project1].[FieldID] AS [FieldID], 
[Project1].[FieldName] AS [FieldName], 
[Project1].[C1] AS [C2], 
[Project1].[FieldParamID] AS [FieldParamID], 
[Project1].[Value] AS [Value], 
[Project1].[FieldID1] AS [FieldID1]
FROM ( SELECT 
    [Extent1].[FormID] AS [FormID], 
    [Extent1].[FormName] AS [FormName], 
    [Join1].[FieldID] AS [FieldID], 
    [Join1].[FieldName] AS [FieldName], 
    [Join1].[FormID] AS [FormID1], 
    [Join1].[FieldParamID] AS [FieldParamID], 
    [Join1].[Value] AS [Value], 
    [Join1].[FieldID1] AS [FieldID1], 
    CASE WHEN ([Join1].[FieldID] IS NULL) THEN CAST(NULL AS int) WHEN ([Join1].[FieldParamID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1], 
    CASE WHEN ([Join1].[FieldID] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
    FROM  [dbo].[PageForm] AS [Extent1]
    LEFT OUTER JOIN  (SELECT [Extent2].[FieldID] AS [FieldID], [Extent2].[FieldName] AS [FieldName], [Extent2].[FormID] AS [FormID], [Extent3].[FieldParamID] AS [FieldParamID], [Extent3].[Value] AS [Value], [Extent3].[FieldID] AS [FieldID1]
        FROM  [dbo].[FormField] AS [Extent2]
        LEFT OUTER JOIN [dbo].[FormFieldParameter] AS [Extent3] ON [Extent2].[FieldID] = [Extent3].[FieldID] ) AS [Join1] ON [Extent1].[FormID] = [Join1].[FormID]
)  AS [Project1]
ORDER BY [Project1].[FormID] ASC, [Project1].[C2] ASC, [Project1].[FieldID] ASC, [Project1].[C1] ASC

The duration of this query shown on the sql profiler shows that this query is what is taking so long to run. The interesting thing about the query, is that there is no filtering on it at all - It is returning the entire tree! I can't understand why it is returning everything, as the filter myObjectSet.Where(c => c.FormID == 1) is pretty explicit. The actual returned object only contains a single entry, which I would expect.

I'm having this problem across my entire data access layer, and its performance is appalling. I have no idea why the generated query doesn't contain the filter - and no idea how to tell it to do so. Does anybody know the answer?

John
  • 1,502
  • 2
  • 13
  • 40
  • 2
    You need .AsIQueryable on CreateObjectSet – Maess Jul 30 '12 at 18:43
  • 1
    Have you read [this](http://stackoverflow.com/questions/5007325/entity-framework-generic-repository-pattern-and-strange-sql-generation)? – Sergei Rogovtcev Jul 30 '12 at 19:07
  • Wow thanks. Adding .AsQueryable to the end of CreateObjectSet did work. I thought that since the .Where statement returned an IQueryable it would be ok. I guess not! Do you know the reason for this behaviour? – John Jul 30 '12 at 19:22
  • 1
    Is the LINQ query code block *exactly* the code you are using? Or do you retrieve parts of that from methods or pass it into methods? I just tested it and I cannot confirm the need of `AsQueryable()` (which would really confuse me). Using your code above creates a SQL query that *has* the expected WHERE clause: Right below the second LEFT OUTER JOIN I get a `WHERE 1 = [Extent1].[FormID]`. – Slauma Jul 30 '12 at 20:24
  • I'm sorry, It is a method call I am using to perform the filtering, and I've not stated exactly what it is doing. Basically it converts the ObjectQuery to an IEnumerable, performing the filter, then converying back to an IQueryable... Makes perfect sense why it is failing now that I've read the linked questions. I'll update my question with the missing information. – John Jul 30 '12 at 22:49

1 Answers1

3

TL;DR Remove the AsEnumerable call and replace it with an AsQueryable call and it should resolve most of the performance issues (outside of actual database execution cost being slow, which is fixed by adding indexes on columns you are filtering / joining on).

Explanation of what is actually happening...

As soon as you call AsEnumerable you are now outside of Entity Framework and in the world of LINQ-to-objects. That means it's going to execute the query against the database when it is enumerated against. It doesn't matter that you call AsQueryable again, that merely means that you are creating a query against an in memory structure.

The effective execution is this.

  1. Create an object query, including all FormFieldProperties linked to the form
  2. Transform the current IQueryable instance into an enumerable.
  3. Add a predicate against the enumerable instance which will only return items whose FormID value is one.
  4. Call ToList, which copies all values from source enumerable to a list.

Now, up until step 4, the query actually hasn't queried the database. When you call ToList, it executes the query in step one (as you see). This query likely is expensive and takes a while because of the amount of data it is returning and/or missing indexes that may improve it's performance.

Once that query is done and materialized, it's result is wrapped in an enumerator.

Now, every object is iterated and checked to see if it matches the predicate that was added in step 3. If it does match, then it is returned to whoever is iterating over it (in this case, the ToList function).

Now that the value has been returned, it is added to the list that is being created with the values.

Finally, you get a list back from the ToList method, and it has exactly what you asked for, but it did all of that in memory rather than in the database.

Darren Kopp
  • 76,581
  • 9
  • 79
  • 93