3

I have a Select statement that is currently formatted like

dbEntity
.GroupBy(x => x.date)
.Select(groupedDate => new {
                             Calculation1 = doCalculation1 ? x.Sum(groupedDate.Column1) : 0),
                             Calculation2 = doCalculation2 ? x.Count(groupedDate) : 0)

In the query doCalculation1 and doCalculation2 are bools that are set earlier. This creates a case statement in the Sql being generated, like

DECLARE @p1 int = 1
DECLARE @p2 int = 0
DECLARE @p3 int = 1
DECLARE @p4 int = 0
SELECT (Case When @p1 = 1 THEN Sum(dbEntity.Column1)
     Else @p2
     End) as Calculation1,
     (Case When @p3 = 1 THEN Count(*)
     Else @p4
     End) as Calculation2

What I want to happen is for the generated sql is to be like this when doCalculation1 is true

SELECT SUM(Column1) as Calculation1, Count(*)  as Calculation2

and like this when doCalculation2 is false

SELECT 0 as Calculation1, Count(*) as Calculation2

Is there any way to force a query through EF to act like this?

Edit:

bool doCalculation = true;
bool doCalculation2 = false;
            dbEntity
            .Where(x => x.FundType == "E")
            .GroupBy(x => x.ReportDate)
              .Select(dateGroup => new 
              {
                  ReportDate = dateGroup.Key,
                  CountInFlows = doCalculation2 ? dateGroup.Count(x => x.Flow > 0) : 0,
                  NetAssetEnd = doCalculation ? dateGroup.Sum(x => x.AssetsEnd) : 0
              })
              .ToList();

generates this sql

-- Region Parameters
DECLARE @p0 VarChar(1000) = 'E'
DECLARE @p1 Int = 0
DECLARE @p2 Decimal(5,4) = 0
DECLARE @p3 Int = 0
DECLARE @p4 Int = 1
DECLARE @p5 Decimal(1,0) = 0
-- EndRegion
SELECT [t1].[ReportDate], 
    (CASE 
        WHEN @p1 = 1 THEN (
            SELECT COUNT(*)
            FROM [dbEntity] AS [t2]
            WHERE ([t2].[Flow] > @p2) AND ([t1].[ReportDate] = [t2].[ReportDate]) AND ([t2].[FundType] = @p0)
        )
        ELSE @p3
     END) AS [CountInFlows], 
    (CASE 
        WHEN @p4 = 1 THEN CONVERT(Decimal(33,4),[t1].[value])
        ELSE CONVERT(Decimal(33,4),@p5)
     END) AS [NetAssetEnd]
FROM (
    SELECT SUM([t0].[AssetsEnd]) AS [value], [t0].[ReportDate]
    FROM [dbEntity] AS [t0]
    WHERE [t0].[FundType] = @p0
    GROUP BY [t0].[ReportDate]
    ) AS [t1]

which has many index scans and a spool and a join in the execution plan. It also takes about 20 seconds on average to run on the test set, with the production set going to be much larger.

I want it to run in the same speed as sql like

select reportdate, 1, sum(AssetsEnd)
from vwDailyFundFlowDetail
where fundtype = 'E'
group by reportdate

which runs in about 12 seconds on average and has the majority of the query tied up in a single index seek in the execution plan. What the actual sql output is doesnt matter, but the performance appears to be much worse with the case statements.

As for why I am doing this, I need to generate a dynamic select statements like I asked in Dynamically generate Linq Select. A user may select one or more of a set of calculations to perform and I will not know what is selected until the request comes in. The requests are expensive so we do not want to run them unless they are necessary. I am setting the doCalculation bools based on the user request.

This query is supposed to replace some code that inserts or deletes characters from a hardcoded sql query stored as a string, which is then executed. That runs fairly fast but is a nightmare to maintain

Community
  • 1
  • 1
Alexander Burke
  • 534
  • 7
  • 22
  • 1
    Why do you want to change the SQL? Have you identified it as a performance issue? Does the execution plan look bad? – Mant101 Aug 18 '15 at 16:01
  • 1
    The version with case statements is consistently running about 40% slower than with the plain selects. The execution plan with case statements has lots of index scans as opposed to the index seek i get with the Select 0 version. I dont know enough about the execution plan to tell what else is happening but I am editing my question in a minute with the exact sql it generates – Alexander Burke Aug 18 '15 at 17:34

1 Answers1

2

It would technically be possible to pass the Expression in your Select query through an expression tree visitor, which checks for constant values on the left-hand side of ternary operators, and replaces the ternary expression with the appropriate sub-expression.

For example:

public class Simplifier : ExpressionVisitor
{
    public static Expression<T> Simplify<T>(Expression<T> expr)
    {
        return (Expression<T>) new Simplifier().Visit(expr);
    }

    protected override Expression VisitConditional(ConditionalExpression node)
    {
        var test = Visit(node.Test);
        var ifTrue = Visit(node.IfTrue);
        var ifFalse = Visit(node.IfFalse);

        var testConst = test as ConstantExpression;
        if(testConst != null)
        {
            var value = (bool) testConst.Value;
            return value ? ifTrue : ifFalse;
        }

        return Expression.Condition(test, ifTrue, ifFalse);
    }

    protected override Expression VisitMember(MemberExpression node)
    {
        // Closed-over variables are represented as field accesses to fields on a constant object.
        var field = (node.Member as FieldInfo);
        var closure = (node.Expression as ConstantExpression);
        if(closure != null)
        {
            var value = field.GetValue(closure.Value);
            return VisitConstant(Expression.Constant(value));
        }
        return base.VisitMember(node);
    }
}

Usage example:

void Main()
{
    var b = true;
    Expression<Func<int, object>> expr = i => b ? i.ToString() : "N/A";
    Console.WriteLine(expr.ToString()); // i => IIF(value(UserQuery+<>c__DisplayClass0).b, i.ToString(), "N/A")
    Console.WriteLine(Simplifier.Simplify(expr).ToString()); // i => i.ToString()
    b = false;
    Console.WriteLine(Simplifier.Simplify(expr).ToString()); // i => "N/A"
}

So, you could use this in your code something like this:

Expression<Func<IGrouping<DateTime, MyEntity>>, ClassYouWantToReturn> select = 
    groupedDate => new {
        Calculation1 = doCalculation1 ? x.Sum(groupedDate.Column1) : 0),
        Calculation2 = doCalculation2 ? x.Count(groupedDate) : 0
    };
var q = dbEntity
    .GroupBy(x => x.date)
    .Select(Simplifier.Simplify(select))

However, this is probably more trouble than it's worth. SQL Server will almost undoubtedly optimize the "1 == 1" case away, and allowing Entity Framework to produce the less-pretty query shouldn't prove to be a performance problem.

Update

Looking at the updated question, this appears to be one of the few instances where producing the right query really does matter, performance-wise.

Besides my suggested solution, there are a few other choices: you could use raw sql to map to your return type, or you could use LinqKit to choose a different expression based on what you want, and then "Invoke" that expression inside your Select query.

StriplingWarrior
  • 151,543
  • 27
  • 246
  • 315
  • I'm not sure I understand. If that's more trouble than it is worth, what should are you suggesting to have the generated Sql end up as I showed in the second case, rather than become a case statement? – Alexander Burke Aug 18 '15 at 15:19
  • @AlexanderBurke: I updated the answer with an example of what it would take to write the expression visitor that would simplify your select expression, so that Entity Framework would produce the kind of query you're asking for. As you can see, it's fairly complicated, and it doesn't provide any real benefit because there's no real difference from SQL Server's perspective, and these queries are only for SQL Server to consume--they aren't meant to be human-readable. So I'm suggesting you leave your code as-is, and don't bother trying to change the SQL output. – StriplingWarrior Aug 18 '15 at 16:10
  • Thank you for the example. Im not sure if Im doing something incorrectly but the sql linq is generating for the case statements seems to be running 40-60% slower than the sql with just a Select 1. I am looking at the execution plan as well and the case statement version has a lot of small table scans and various joins that are taking up the majority of the query. The Select 1 version of the sql is taken up 72% by one table seek and 27% by a hastmatch and nested loop. Ill edit my question to include the exact linq query and the sql it generates so you can see – Alexander Burke Aug 18 '15 at 17:32
  • If you dont mind clarifying a bit more. I am implementing the expression visitor listed above. I think I am starting to understand how it works but the instantiation of the select Expression> cannot tell what type groupedDate is and so it cannot do Sum or Count. I cant figure out what the name of this type of thing even is so I dont know what to look for in terms of guides on its syntax. If you could tell me the name or correct syntax I would appreciate it – Alexander Burke Aug 18 '15 at 18:59
  • @AlexanderBurke: If you look at the type of expression expected by your Select() statement, that's what that expression will be. If you can use a refactoring tool like Resharper to extract your lambda expression as a variable, then it should use the correct type. What is the type of `dbEntity`? – StriplingWarrior Aug 18 '15 at 19:15
  • my dbentity is a table named dailyFlow and I can see the type of the Select. Resharper and visual studio say that the type of groupedDate in the assignment to the variable select, is `(paramater) ? groupedDate` which is not a a format I've seen before. It then cant tell what columns groupedDate should have. This is where I am lost – Alexander Burke Aug 18 '15 at 19:30
  • I figured out the issue with the typing, I just didnt understand what types Func needed it seemed. I got this solution to do what I want mostly. The execution plan between what I wrote out for plain sql and what is being generated by EF now is still slightly different, but the majority of the query time is spent on the same pieces. Its also running with only 10s of extra overhead on EF no matter the test size, instead of 40-60% slower. Thank you for the help – Alexander Burke Aug 18 '15 at 21:16
  • If youd like to update your answer, the only missing piece of the puzzle was that the variable select had to be of the type `Expression>, ClassYouWantToReturn>` – Alexander Burke Aug 18 '15 at 21:26
  • @AlexanderBurke: Oh, of course! Sorry about that. I updated the answer. BTW, 10s of overhead on EF seems like an awful lot. Is that a one-time cost while Entity Framework analyzes its data model, or are you seeing that multiple times within the same instance of the application? Particularly if it's the latter, that may be an indication of another problem that you'll want to track down. – StriplingWarrior Aug 19 '15 at 04:51
  • Im seeing it every time an ef query is run, compared to when I manually write the query I want. Its coming from the query itself, not EF analyzing the data model. 10s seems like a lot to me too, especially when its on a call thats 5s normally so I will be tracking it down. Thanks again for the help. I learned more getting this question answered than I have all year – Alexander Burke Aug 19 '15 at 15:43
  • @AlexanderBurke: You may want to look into [this issue](http://stackoverflow.com/a/7676999/120955). Sometimes SQL Server ends up using bad query execution plans that don't exhibit the same behavior when running SQL queries manually. – StriplingWarrior Aug 25 '15 at 15:00
  • I also remember running into an issue where the way I initialized my Entity Framework contexts prevented them from reusing some metadata analysis that was supposed to occur only once. And I've seen some people run into issues where DNS lookups actually caused basic connections to take a long time to resolve from .NET, and they had to fiddle with proxy settings. Those are probably all avenues worth investigating. – StriplingWarrior Aug 25 '15 at 15:12