5

I have the following code, which is misbehaving:

TPM_USER user = UserManager.GetUser(context, UserId);
var tasks = (from t in user.TPM_TASK
             where t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
             orderby t.DUEDATE, t.PROJECTID
             select t);

The first line, UserManager.GetUser just does a simple lookup in the database to get the correct TPM_USER record. However, the second line causes all sorts of SQL chaos.

First off, it's executing two SQL statements here. The first one grabs every single row in TPM_TASK which is linked to that user, which is sometimes tens of thousands of rows:

SELECT 
 -- Columns
 FROM  TPMDBO.TPM_USERTASKS "Extent1"
 INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
 WHERE "Extent1".USERID = :EntityKeyValue1

This query takes about 18 seconds on users with lots of tasks. I would expect the WHERE clause to contain the STAGEID filters too, which would remove the majority of the rows.

Next, it seems to execute a new query for each TPM_PROJECTVERSION pair in the list above:

SELECT 
 -- Columns
 FROM TPMDBO.TPM_PROJECTVERSION "Extent1"
 WHERE ("Extent1".PROJECTID = :EntityKeyValue1) AND ("Extent1".VERSIONID = :EntityKeyValue2)

Even though this query is fast, it's executed several hundred times if the user has tasks in a whole bunch of projects.

The query I would like to generate would look something like:

SELECT 
 -- Columns
 FROM  TPMDBO.TPM_USERTASKS "Extent1"
 INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
 INNER JOIN TPMDBO.TPM_PROJECTVERSION "Extent3" ON "Extent2".PROJECTID = "Extent3".PROJECTID AND "Extent2".VERSIONID = "Extent3".VERSIONID
 WHERE "Extent1".USERID = 5 and "Extent2".STAGEID > 0 and "Extent2".STAGEID <> 3 and "Extent3".STAGEID <= 10

The query above would run in about 1 second. Normally, I could specify that JOIN using the Include method. However, this doesn't seem to work on properties. In other words, I can't do:

from t in user.TPM_TASK.Include("TPM_PROJECTVERSION")

Is there any way to optimize this LINQ statement? I'm using .NET4 and Oracle as the backend DB.

Solution:

This solution is based on Kirk's suggestions below, and works since context.TPM_USERTASK cannot be queried directly:

var tasks = (from t in context.TPM_TASK.Include("TPM_PROJECTVERSION")
             where t.TPM_USER.Any(y => y.USERID == UserId) &&
             t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
             orderby t.DUEDATE, t.PROJECTID
             select t);

It does result in a nested SELECT rather than querying TPM_USERTASK directly, but it seems fairly efficient none-the-less.

Mike Christensen
  • 88,082
  • 50
  • 208
  • 326

1 Answers1

4

Yes, you are pulling down a specific user, and then referencing the relationship TPM_TASK. That it is pulling down every task attached to that user is exactly what it's supposed to be doing. There's no ORM SQL translation when you're doing it this way. You're getting a user, then getting all his tasks into memory, and then performing some client-side filtering. This is all done using lazy-loading, so the SQL is going to be exceptionally inefficient as it can't batch anything up.

Instead, rewrite your query to go directly against TPM_TASK and filter against the user:

var tasks = (from t in context.TPM_TASK
         where t.USERID == user.UserId && t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
         orderby t.DUEDATE, t.PROJECTID
         select t);

Note how we're checking t.USERID == user.UserId. This produces the same effect as user.TPM_TASK but now all the heavy lifting is done by the database rather than in memory.

Kirk Woll
  • 76,112
  • 22
  • 180
  • 195
  • Unfortunately, this idea won't work. `TPM_TASK` has no `USERID` property. Users relate to tasks through the `TPM_USERTASK` table, which you can't query directly since it's used in a many-to-many relationship. Maybe I'd be better off creating a view in the database or something? – Mike Christensen Jun 20 '12 at 20:00
  • You can certainly write the query to go against `TPM_USERTASK` as well. I'm not sure how exactly your schema is set up, but something along the lines of `where t.TPM_TASK.Any(y => y.USERID == t.USER)` or just use `join` to bring in the many to many. You definitely do not to be messing with views to achieve this. – Kirk Woll Jun 20 '12 at 20:09
  • Yup, thanks for your clear explanation! That's pretty much what I figured was going on, but I'm still fairly new to EF. Sometimes it's hard to see what gets magically turned into SQL and what gets handled manually by the runtime. LINQ sure lets you shoot yourself in the foot easily. – Mike Christensen Jun 20 '12 at 20:35
  • 1
    @Mike, it does; with EF you can [disable the lazy loading](http://stackoverflow.com/questions/2967214/disable-lazy-loading-by-default-in-entity-framework-4). That way you'd just get null reference exceptions instead of poor performance. Since exceptions are obvious and poor performance can be subtle, this might be something worth considering. If you disable lazy loading, you'll have to make sure to also learn about the `.Include` function (and ideally the [variant with lambdas](http://stackoverflow.com/questions/4544756/using-include-in-entity-framework-4-with-lambda-expressions)). – Kirk Woll Jun 20 '12 at 20:42
  • Never knew you could use lambda expressions in the `Include` method, that's good to know! – Mike Christensen Jun 20 '12 at 20:55
  • @Mike only since 4.1; before that, `Include` takes dot-delimited navigation property strings – AakashM Jun 21 '12 at 08:01