5

I was asked to produce a report that is driven by a fairly complex SQL query against a SQL Server database. Since the site of the report was already using Entity Framework 4.1, I thought I would attempt to write the query using EF and LINQ:

var q = from r in ctx.Responses
                    .Where(x => ctx.Responses.Where(u => u.UserId == x.UserId).Count() >= VALID_RESPONSES)
                    .GroupBy(x => new { x.User.AwardCity, x.Category.Label, x.ResponseText })
         orderby r.FirstOrDefault().User.AwardCity, r.FirstOrDefault().Category.Label, r.Count() descending
         select new
         {
             City = r.FirstOrDefault().User.AwardCity,
             Category = r.FirstOrDefault().Category.Label,
             Response = r.FirstOrDefault().ResponseText,
             Votes = r.Count()
         };

This query tallies votes, but only from users who have submitted a certain number of required minimum votes.

This approach was a complete disaster from a performance perspective, so we switched to ADO.NET and the query ran very quickly. I did look at the LINQ generated SQL using the SQL Profiler, and although it looked atrocious as usual I didn't see any clues as to how to optimize the LINQ statement to make it more efficient.

Here's the straight TSQL version:

WITH ValidUsers(UserId)
AS
(
    SELECT UserId
    FROM Responses
    GROUP BY UserId
    HAVING COUNT(*) >= 103
)
SELECT d.AwardCity
    , c.Label
    , r.ResponseText
    , COUNT(*) AS Votes
FROM ValidUsers u
JOIN Responses r ON r.UserId = u.UserId
JOIN Categories c ON r.CategoryId = c.CategoryId
JOIN Demographics d ON r.UserId = d.Id
GROUP BY d.AwardCity, c.Label, r.ResponseText
ORDER BY d.AwardCity, s.SectionName, COUNT(*) DESC

What I'm wondering is: is this query just too complex for EF and LINQ to handle efficiently or have I missed a trick?

Paul Keister
  • 12,851
  • 5
  • 46
  • 75
  • I'm guessing all the FirstOrDefaults are causing it. Have you tried adding a `let response = r.First()` before the groupby? Or swapping the Select and the OrderBy? Like this http://stackoverflow.com/a/5013740/736079 – jessehouwing Jan 17 '13 at 22:32
  • 2
    Is there a navigation property like `User.Responses`? – Gert Arnold Jan 17 '13 at 22:37
  • @jessehouwing using let response helps considerably, although the LINQ version is still much slower than ADO.NET. If you enter this as an answer I'll at least upvote it. I'm having problems with Jon Skeet's strategy of swapping select and order by, mainly I can't figure out how to get the count with this construct. – Paul Keister Jan 18 '13 at 00:58
  • I'm also having trouble wrapping my head around the query. It might help if you share the plain SQL for us to see. – jessehouwing Jan 18 '13 at 01:24
  • 1
    @GertArnold yes, I did confirm that adding a navigation property also help by a factor of 3. – Paul Keister Jan 18 '13 at 01:46

2 Answers2

4

Using a let to reduce the number of r.First()'s will probably improve performance. It's probably not enough yet.

 var q = from r in ctx.Responses
                .Where()
                .GroupBy()
     let response = r.First()
     orderby response.User.AwardCity, response.Category.Label, r.Count() descending
     select new
     {
         City = response.User.AwardCity,
         Category = response.Category.Label,
         Response = response.ResponseText,
         Votes = r.Count()
     };
jessehouwing
  • 106,458
  • 22
  • 256
  • 341
  • I'm marking this as the answer because it's effect is roughly equivalent to the navigation property solution proposed by @GertArnold, but Gert hasn't posted his comment as an answer yet (sorry Gert, I will upvote you). Should note that even with both optimization applied ADO.NET is still faster, but the revised LINQ is orders of magnitude faster than it was. – Paul Keister Jan 18 '13 at 01:53
1

Maybe this change improve the performance, removing the resulting nested sql select in the where clause

First get the votes of each user and put them in a Dictionary

var userVotes = ctx.Responses.GroupBy(x => x.UserId )
                             .ToDictionary(a => a.Key.UserId,  b => b.Count());

var cityQuery = ctx.Responses.ToList().Where(x => userVotes[x.UserId] >= VALID_RESPONSES)
               .GroupBy(x => new { x.User.AwardCity, x.Category.Label, x.ResponseText })
               .Select(r => new
                       {
                           City = r.First().User.AwardCity,
                           Category = r.First().Category.Label,
                           Response = r.First().ResponseText,
                           Votes = r.Count()
                       })
               .OrderByDescending(r => r.City, r.Category, r.Votes());
Esteban Elverdin
  • 3,552
  • 1
  • 17
  • 21
  • I had tried this approach already. Here is the error:LINQ to Entities does not recognize the method 'Int32 get_Item(Int32)' method, and this method cannot be translated into a store expression – Paul Keister Jan 18 '13 at 01:15
  • I think I missed to to add ToList() after ctx.Responses, resulting in ctx.Responses.ToList().Where(.... – Esteban Elverdin Jan 18 '13 at 01:18
  • I think the core problem is that the EF LINQ driver can't use a reference to an external dictionary while it's building the entity SQL. I'm pretty sure I copied and pasted your solution literally, all I did was fix the syntax of the first ToDictionary lambda. – Paul Keister Jan 18 '13 at 02:04
  • I tested it with in memory objects and worked. I thought adding ToList will bring all your records to memory and then you can use the dictionary. Will try to replicate your scenario. – Esteban Elverdin Jan 18 '13 at 02:36