1

I'm trying to get the Entity Framework (v6.1.3) to generate efficient SQL for a query with multiple aggregates.

Here's a simplified example.

Table:

CREATE TABLE [dbo].[CaseAttorney](
[CaseAttorneyID] [int] IDENTITY(1,1) NOT NULL,
[CaseNumber] [varchar](30) NOT NULL,
[AttorneyID] [int] NOT NULL,
[DateAssigned] [datetime] NULL,
[DateUnassigned] [datetime] NULL,
 CONSTRAINT [PK_CaseAttorney] PRIMARY KEY CLUSTERED 
(
    [CaseAttorneyID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

C#:

using (var cx = new DATA())
{

    var startDate = DateTime.Parse("1/1/2014");
    var endDate = startDate.AddDays(1);

    cx.Database.Log = Console.WriteLine;

    var res = cx.CaseAttorneys.
        GroupBy(o => new
        {
            AttorneyID = o.AttorneyID
        }).Select(g => new
        {
            AttorneyID = g.Key.AttorneyID,
            ActiveStart = g.Sum(item => (item.DateAssigned < startDate && (item.DateUnassigned == null || item.DateUnassigned >= startDate) ? 1 : 0)),
            Assigned = g.Sum(item => (item.DateAssigned >= startDate && item.DateAssigned <= endDate) ? 1 : 0)
        }).ToArray();

}

Instead of generating a query with a single GROUP BY, I get a very inefficient query containing multiple nested tables. This happens with both COUNTand SUM:

SELECT 
[Project3].[AttorneyID] AS [AttorneyID], 
[Project3].[C1] AS [C1], 
[Project3].[C2] AS [C2]
FROM ( SELECT 
    [Project2].[AttorneyID] AS [AttorneyID], 
    [Project2].[C1] AS [C1], 
    (SELECT 
        SUM([Filter2].[A1]) AS [A1]
        FROM ( SELECT 
            CASE WHEN (([Extent3].[DateAssigned] >= @p__linq__2) AND ([Extent3].[DateAssigned] <= @p__linq__3)) THEN 1 ELSE 0 END AS [A1]
            FROM [dbo].[CaseAttorney] AS [Extent3]
            WHERE [Project2].[AttorneyID] = [Extent3].[AttorneyID]
        )  AS [Filter2]) AS [C2]
    FROM ( SELECT 
        [Distinct1].[AttorneyID] AS [AttorneyID], 
        (SELECT 
            SUM([Filter1].[A1]) AS [A1]
            FROM ( SELECT 
                CASE WHEN (([Extent2].[DateAssigned] < @p__linq__0) AND (([Extent2].[DateUnassigned] IS NULL) OR ([Extent2].[DateUnassigned] >= @p__linq__1))) THEN 1 ELSE 0 END AS [A1]
                FROM [dbo].[CaseAttorney] AS [Extent2]
                WHERE [Distinct1].[AttorneyID] = [Extent2].[AttorneyID]
            )  AS [Filter1]) AS [C1]
        FROM ( SELECT DISTINCT 
            [Extent1].[AttorneyID] AS [AttorneyID]
            FROM [dbo].[CaseAttorney] AS [Extent1]
        )  AS [Distinct1]
    )  AS [Project2]
)  AS [Project3]

The nesting in and of itself wouldn't be too bad if it didn't keep hitting the same tables over and over again. This problem gets worse the more aggregate columns are added.

I haven't found any similar questions here, so I'm sure I'm doing something wrong.

What is the correct way of getting the Entity Framework to generate an efficient projection when I want to return multiple aggregate columns?

PJ7
  • 763
  • 5
  • 15
  • 1
    You can always write your own SQL in a stored proc or just a string and have EF call that. – GendoIkari Feb 10 '16 at 16:21
  • Stored procedures are my fallback, but one of the points of using EF is DRY. I would have liked to put the logic behind such questions as 'what is an assigned case', and all the other derived properties, into projectable expressions which could be shared between queries and other methods in my service layer. I give up some maintainability by repeating logic in sprocs or views. I want all business logic in one place, but maybe ORM still isn't ready to fully support that, even for some straightforward cases. – PJ7 Feb 10 '16 at 18:22

1 Answers1

1

Count(predicate) (and actually any function involving predicates) seems to have that effect on the generated SQL query.

However, the conditional Sum (i.e. Sum(predicate ? 1 : 0)) has no such affect, so the following will do what you want:

Update: It turns out that the Sum trick is necessary, but not enough when predicates use variables like in your case. It most probably is EF bug, because playing with different GroupBy overloads doesn't help, except if you include a temporary projection including conditional expressions before doing GroupBy.

So (finally) the following query

db.CaseAttorneys.Select(item => new
{
    Item = item,
    ActiveStart = item.DateAssigned < startDate && (item.DateUnassigned == null || item.DateUnassigned >= startDate) ? 1 : 0,
    Assigned = item.DateAssigned >= startDate && item.DateAssigned <= endDate ? 1 : 0
})
.GroupBy(o => new
{
    AttorneyID = o.Item.AttorneyID
})
.Select(g => new
{
    AttorneyID = g.Key.AttorneyID,
    ActiveStart = g.Sum(item => item.ActiveStart),
    Assigned = g.Sum(item => item.Assigned)
}).ToArray();

produced the desired SQL

SELECT 
    [GroupBy1].[K1] AS [AttorneyID], 
    [GroupBy1].[A1] AS [C1], 
    [GroupBy1].[A2] AS [C2]
    FROM ( SELECT 
        [Extent1].[K1] AS [K1], 
        SUM([Extent1].[A1]) AS [A1], 
        SUM([Extent1].[A2]) AS [A2]
        FROM ( SELECT 
            [Extent1].[AttorneyID] AS [K1], 
            CASE WHEN (([Extent1].[DateAssigned] < @p__linq__0) AND (([Extent1].[DateUnassigned] IS NULL) OR ([Extent1].[DateUnassigned] >= @p__linq__1))) THEN 1 ELSE 0 END AS [A1], 
            CASE WHEN (([Extent1].[DateAssigned] >= @p__linq__2) AND ([Extent1].[DateAssigned] <= @p__linq__3)) THEN 1 ELSE 0 END AS [A2]
            FROM [dbo].[CaseAttorneys] AS [Extent1]
        )  AS [Extent1]
        GROUP BY [K1]
    )  AS [GroupBy1]
Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
  • Unfortunately I already tried that. At least using v6.13, the generated SQL for sums is similarly poor. – PJ7 Feb 10 '16 at 18:09
  • You think I'm posting something w/o checking it (not with your model of course)? I'm on EF6.1.3 and the SQL is much much better compared to `Count` version (just one real `SELECT from dbo.Table` and `GROUP BY`). I guess `v6.13` means `v6.1.3`, right? If yes, try again, there is **no** other LINQ way. If no, then upgrade. – Ivan Stoev Feb 10 '16 at 18:20
  • @PJ7 Whaw, I'm taking my words back. In my test I've used int field criterias with constant values embedded (like `x >= 3 && x <= 5`) and the `Sum` solved the problem. But once I use `int a = 3, b = 5;` and then `'x >= a && x <= b`, similar to your case, the bad SQL came back! So no resolution for now, thank you for posting the case. – Ivan Stoev Feb 10 '16 at 19:22
  • This also works with the ternary operation in the select statement. Thanks. – PJ7 Feb 11 '16 at 14:10