1

I'm trying to write a simple SQL query in LinQ, and no matter how hard I try, I always get a complex query.

Here is the SQL I am trying to achieve (this is not what I'm getting):

SELECT
    ClearingAccounts.ID,
    SUM(CASE WHEN Payments.StatusID = 1 THEN Payments.TotalAmount ELSE 0 END) AS Sum1,
    SUM(CASE WHEN DirectDebits.StatusID = 2 THEN DirectDebits.TotalAmount ELSE 0 END) AS Sum2,
    SUM(CASE WHEN Payments.StatusID = 2 THEN Payments.TotalAmount ELSE 0 END) AS Sum3,
    SUM(CASE WHEN DirectDebits.StatusID = 1 THEN DirectDebits.TotalAmount ELSE 0 END) AS Sum4
FROM ClearingAccounts
LEFT JOIN Payments ON Payments.ClearingAccountID = ClearingAccounts.ID
LEFT JOIN DirectDebits ON DirectDebits.ClearingAccountID = ClearingAccounts.ID
GROUP BY ClearingAccounts.ID

Here is the code:

from clearingAccount in clearingAccounts
let payments = clearingAccount.Payments
let directDebits = clearingAccount.DirectDebits
select new
{
    ID = clearingAccount.ID,
    Sum1 = payments.Sum(p => p.StatusID == 1 ? p.TotalAmount : 0),
    Sum2 = directDebits.Sum(p => p.StatusID == 2 ? p.TotalAmount : 0),
    Sum3 = payments.Sum(p => p.StatusID == 2 ? p.TotalAmount : 0),
    Sum4 = directDebits.Sum(p => p.StatusID == 1 ? p.TotalAmount : 0),
}

The generated query gets the data from the respective table for each sum, so four times. I'm not sure if it's even possible to optimize this?

EDIT Here the is generated query:

SELECT 
    [Project5].[ID] AS [ID], 
    [Project5].[C1] AS [C1], 
    [Project5].[C2] AS [C2], 
    [Project5].[C3] AS [C3], 
    [Project5].[C4] AS [C4]
    FROM ( SELECT 
        [Project4].[ID] AS [ID], 
        [Project4].[C1] AS [C1], 
        [Project4].[C2] AS [C2], 
        [Project4].[C3] AS [C3], 
        (SELECT 
            SUM([Filter5].[A1]) AS [A1]
            FROM ( SELECT 
                CASE WHEN (1 = [Extent5].[StatusID]) THEN [Extent5].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
                FROM [dbo].[DirectDebits] AS [Extent5]
                WHERE [Project4].[ID] = [Extent5].[ClearingAccountID]
            )  AS [Filter5]) AS [C4]
        FROM ( SELECT 
            [Project3].[ID] AS [ID], 
            [Project3].[C1] AS [C1], 
            [Project3].[C2] AS [C2], 
            (SELECT 
                SUM([Filter4].[A1]) AS [A1]
                FROM ( SELECT 
                    CASE WHEN (2 = [Extent4].[StatusID]) THEN [Extent4].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
                    FROM [dbo].[Payments] AS [Extent4]
                    WHERE [Project3].[ID] = [Extent4].[ClearingAccountID]
                )  AS [Filter4]) AS [C3]
            FROM ( SELECT 
                [Project2].[ID] AS [ID], 
                [Project2].[C1] AS [C1], 
                (SELECT 
                    SUM([Filter3].[A1]) AS [A1]
                    FROM ( SELECT 
                        CASE WHEN (2 = [Extent3].[StatusID]) THEN [Extent3].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
                        FROM [dbo].[DirectDebits] AS [Extent3]
                        WHERE [Project2].[ID] = [Extent3].[ClearingAccountID]
                    )  AS [Filter3]) AS [C2]
                FROM ( SELECT 
                    [Project1].[ID] AS [ID], 
                    (SELECT 
                        SUM([Filter2].[A1]) AS [A1]
                        FROM ( SELECT 
                            CASE WHEN (1 = [Extent2].[StatusID]) THEN [Extent2].[TotalAmount] ELSE cast(0 as decimal(18)) END AS [A1]
                            FROM [dbo].[Payments] AS [Extent2]
                            WHERE [Project1].[ID] = [Extent2].[ClearingAccountID]
                        )  AS [Filter2]) AS [C1]
                    FROM ( SELECT 
                        [Extent1].[ID] AS [ID]
                        FROM [dbo].[ClearingAccounts] AS [Extent1]
                        WHERE ([Extent1].[CustomerID] = 3) AND ([Extent1].[Deleted] <> 1)
                    )  AS [Project1]
                )  AS [Project2]
            )  AS [Project3]
        )  AS [Project4]
    )  AS [Project5]
Gabriel G. Roy
  • 2,552
  • 2
  • 27
  • 39
  • If you were to write the SQL yourself, how would you optimize it? The query seems fine to me. Have you profiled it to confirm that this is in fact not optimal? – sstan Jul 16 '15 at 15:21
  • The first query is the one I'm trying to achieve. I'm going to link what I'm getting instead, maybe it wasn't clear enough? – Gabriel G. Roy Jul 16 '15 at 15:23
  • Ah, I got the impression that the SQL you posted was generated by EF. Yes, please do post the EF-generated SQL. – sstan Jul 16 '15 at 15:24
  • 1
    I know it's blasphemy but why don't you use the hand written sql directly? EF can be used as a mapper only. Actually don't expect top performance and elegant queries from an ORM. – MikeSW Jul 16 '15 at 15:48
  • 1
    Your SQL query is wrong. LEFT JOINing multiple rows two times results in a cross product of those joined rows. Your sums are too big. – usr Jul 16 '15 at 16:24

2 Answers2

1

Edit

Note that as per @usr's comment, that your original Sql Query is broken. By LEFT OUTER joining on two independent tables, and then grouping on the common join key, as soon as one of the DirectDebits or Payments tables returns more than one row, you will erroneously duplicate the TotalAmount value in the 'other' SUMmed colums (and vice versa). e.g. If a given ClearingAccount has 3 DirectDebits and 4 Payments, you will get a total of 12 rows (whereas you should be summing 3 and 4 rows independently for the two tables). A better Sql Query would be:

WITH ctePayments AS
(
  SELECT
      ClearingAccounts.ID,
      -- Note the ELSE 0 projection isn't required as nulls are eliminated from aggregates
      SUM(CASE WHEN Payments.StatusID = 1 THEN Payments.TotalAmount END) AS Sum1,
      SUM(CASE WHEN Payments.StatusID = 2 THEN Payments.TotalAmount END) AS Sum3
  FROM ClearingAccounts
  INNER JOIN Payments ON Payments.ClearingAccountID = ClearingAccounts.ID
  GROUP BY ClearingAccounts.ID
),
cteDirectDebits AS
(
  SELECT
      ClearingAccounts.ID,
      SUM(CASE WHEN DirectDebits.StatusID = 2 THEN DirectDebits.TotalAmount END) AS Sum2,
      SUM(CASE WHEN DirectDebits.StatusID = 1 THEN DirectDebits.TotalAmount END) AS Sum4
  FROM ClearingAccounts
  INNER JOIN DirectDebits ON DirectDebits.ClearingAccountID = ClearingAccounts.ID
  GROUP BY ClearingAccounts.ID
)
SELECT ca.ID, COALESCE(p.Sum1, 0) AS Sum1, COALESCE(d.Sum2, 0) AS Sum2, 
       COALESCE(p.Sum3, 0) AS Sum3, COALESCE(d.Sum4, 0) AS Sum4
FROM
  ClearingAccounts ca
  LEFT OUTER JOIN ctePayments p
    ON ca.ID = p.ID
  LEFT OUTER JOIN cteDirectDebits d
    ON ca.ID = d.ID;
  -- GROUP BY not required, since we have already guaranteed at most one row 
  -- per joined table in the CTE's, assuming ClearingAccounts.ID is unique;

You'll want to fix and test this with test cases before you even contemplate conversion to LINQ.

Old Answer(s)

The Sql construct:

SELECT SUM(CASE WHEN ... THEN 1 ELSE 0 END) AS Something

when applied in a SELECT list, is a common hack 'alternative' to pivot data from the 'greater' select into columns which meet the projection criteria (and hence the zero if not matched) . It isn't really a sum at all, its a 'matched' count.

With regards to optimizing the Sql generated, another alternative would be to materialize the data after joining and grouping (and of course, if there is a predicate WHERE clause, apply that in Sql too via IQueryable), and then do the conditional summation in memory:

var result2 = Db.ClearingAccounts
    .Include(c => c.Payments)
    .Include(c => c.DirectDebits)
    .GroupBy(c => c.Id)
    .ToList() // or any other means to force materialization here.
    .ToDictionary(
        grp => grp.Key,
        grp => new
        {
            PaymentsByStatus = grp.SelectMany(x => x.Payments)
                .GroupBy(p => p.StatusId),
            DirectDebitByStatus = grp.SelectMany(x => x.Payments)
                .GroupBy(p => p.StatusId),
        })
    .Select(ca => new
    {
        ID = ca.Key,
        Sum1 = ca.Value.PaymentsByStatus.Where(pbs => pbs.Key == 1)
            .Select(pbs => pbs.Select(x => x.TotalAmount).Sum()),
        Sum2 = ca.Value.DirectDebitByStatus.Where(pbs => pbs.Key == 2)
            .Select(ddbs => ddbs.Select(x => x.TotalAmount).Sum()),
        Sum3 = ca.Value.PaymentsByStatus.Where(pbs => pbs.Key == 2)
            .Select(pbs => pbs.Select(x => x.TotalAmount).Sum()),
        Sum4 = ca.Value.DirectDebitByStatus.Where(pbs => pbs.Key == 1)
            .Select(ddbs => ddbs.Select(x => x.TotalAmount).Sum())
    });

However, personally, I would leave this pivot projection directly in Sql, and then use something like SqlQuery to then deserialize the result back from Sql directly into the final Entity type.

StuartLC
  • 104,537
  • 17
  • 209
  • 285
  • 1
    In this case, he really ***is*** doing a sum. – sstan Jul 16 '15 at 15:25
  • 2
    But he's summing `p.TotalAmount`, not the value `1`. Am I missing something? – sstan Jul 16 '15 at 15:30
  • You can't sum booleans, so the conditional must be added back. – usr Jul 16 '15 at 16:19
  • Yup, I missed the TotalAmount sumation and assumed OP needed a simple 1 / 0 count - hence the comments by @sstan. And to confirm your point that any time there is more than one DirectDebit or Payment row per ClearingAccount that the summation will erroneously multiply. – StuartLC Jul 16 '15 at 17:12
-1

1) Add AsNoTracking in EF to avoid tracking changes.

Check that you have indexes on the columns you are using for the JOINs. Especially the column that you are using to group by. Profile the query and optimize it. EF has also overhead over a stored procedure.

or

2) If you cannot find a way to make it as fast as you need, create a stored procedure and call it from EF. Even the same query will be faster.

Francisco Goldenstein
  • 13,299
  • 7
  • 58
  • 74
  • I have seen many examples where SPs where much faster. It's not only about the query, it's also about what EF does under the hood. Last week I had a query that was performing slow, I created a SP with the same query and call it using EF and it was much faster. Before using the SP I was not tracking changes in EF. ORMs are great but when the query is complex and you need high performance, SPs are always faster. – Francisco Goldenstein Jul 16 '15 at 16:13
  • By SP you mean hand written sql. It's not a fair comparison between EF generated sql and hand coded sql, SP or not. – MikeSW Jul 16 '15 at 16:17
  • That's why I said, if you cannot get the performance you need, you'll have to write a SP and call it from EF. Performance cannot be faster than that. – Francisco Goldenstein Jul 16 '15 at 16:18
  • 1
    And what I've said is SP or not ,it doesn't matter as long as it's hand coded not EF generated. – MikeSW Jul 16 '15 at 16:27