4

Can someone explain why if I run this expression:

const string testValue = "ABC"; 
return NameDbContext.MasterNames
    .Where(m => m.Names.Any(n => n.LastName == testValue))
    .ToList();

I get the expected result, however if I run the same with testValue as a variable it fails:

string testValue = "ABC"; 
return NameDbContext.MasterNames
    .Where(m => m.Names.Any(n => n.LastName == testValue))
    .ToList();

This seems to happen only with string. Similar code with int worked fine with both testValue as a variable or as a constant.

I am suspecting this is due to the object nature of string. If this is the case how could I call this expression with a variable (I don't know the value of testValue at compile time).

Thank you.

EDIT:

This query is run against a large oracle data table (> 6 million rows). When using a constant it returns right away with the proper result set. When running with the variable, it seems that the where is applied very inefficiently (it takes over a minute to return).

EDIT2:

Tracing queries in the database I see:

When calling with a constant:

SELECT *
  FROM (SELECT   "Filter2"."MALPHA_KEY" AS "MALPHA_KEY"
      FROM (SELECT "Extent1"."MALPHA_KEY" AS "MALPHA_KEY",
          ROW_NUMBER () OVER (ORDER BY "Extent1"."MALPHA_KEY" ASC)
                                                              AS "row_number"
                    FROM "RMS"."ALPHA_MASTER_NAME" "Extent1"
                   WHERE (EXISTS (
                             SELECT 1 AS "C1"
                               FROM "RMS"."ALPHA" "Extent2"
                              WHERE (    ("Extent1"."MALPHA_KEY" =
                                                        "Extent2"."MALPHA_KEY"
                                         )
                                     AND ('ABC' = "Extent2"."LAST_NAME")
                                    ))
                         )) "Filter2"
           WHERE ("Filter2"."row_number" > 0)
        ORDER BY "Filter2"."MALPHA_KEY" ASC)
 WHERE (ROWNUM <= (50))

When calling with a variable:

SELECT *
  FROM (SELECT   "Project2"."MALPHA_KEY" AS "MALPHA_KEY"
            FROM (SELECT "Project2"."MALPHA_KEY" AS "MALPHA_KEY",
                         ROW_NUMBER () OVER (ORDER BY "Project2"."MALPHA_KEY" ASC)
                                                              AS "row_number"
                    FROM (SELECT "Extent1"."MALPHA_KEY" AS "MALPHA_KEY"
                            FROM "RMS"."ALPHA_MASTER_NAME" "Extent1"
                           WHERE (EXISTS (
                                     SELECT 1 AS "C1"
                                       FROM "RMS"."ALPHA" "Extent2"
                                      WHERE (    ("Extent1"."MALPHA_KEY" =
                                                        "Extent2"."MALPHA_KEY"
                                                 )
                                             AND (   ("Extent2"."LAST_NAME" =
                                                                   :p__linq__0
                                                     )
                                                  OR (    ("Extent2"."LAST_NAME" IS NULL
                                                          )
                                                      AND (:p__linq__0 IS NULL
                                                          )
                                                     )
                                                 )
                                            ))
                                 )) "Project2") "Project2"
           WHERE ("Project2"."row_number" > 0)
        ORDER BY "Project2"."MALPHA_KEY" ASC)
 WHERE (ROWNUM <= (50))

Note the difference in the where statement (beside the use of a variable) it tests for NULL equality

    AND (   ("Extent2"."LAST_NAME" = :p__linq__0
        )
   OR (    ("Extent2"."LAST_NAME" IS NULL )
   AND (:p__linq__0 IS NULL )  )  )

The test for the NULL is resulting in the full table scans...

Bibi
  • 73
  • 1
  • 5
  • 1
    Try ((n => n.LastName.Equals(testValue)), see if there is a difference. – Riad Baghbanli Apr 04 '16 at 22:44
  • 4
    What does "it fails" mean? – zerkms Apr 04 '16 at 22:44
  • 3
    Please explain what fails, what kind of error do you get? – Bas Apr 04 '16 at 22:44
  • You should hook up logging to your `DbContext` and inspect the queries generated in both cases. Either the variable version is forcing a full table load, or its translation to SQL results in a parameter type that doesn't match the column type in your database, causing indexes to be bypassed. – Kirill Shlenskiy Apr 05 '16 at 00:08
  • Is `testValue` a simple variable, or is it actually a property of some other variable? – Ian Mercer Apr 05 '16 at 00:15
  • @rbaghbanli: Using `((n => n.LastName.Equals(testValue))` does not make a difference – Bibi Apr 05 '16 at 00:24
  • @KirillShlenskiy: I just completed some database trace. See edited question. – Bibi Apr 05 '16 at 00:26
  • 2
    @Bibi, looking good, we're getting somewhere. First thing to try: `DbContext.Configuration.UseDatabaseNullSemantics = true` will cause the second query to produce a WHERE clause closer to the "const" version. If it's still slow after that, it's most likely parameter data type mismatch (`:p__linq__0`). – Kirill Shlenskiy Apr 05 '16 at 00:45
  • @KirillShlenskiy, I will try UseDatabaseNullSemantics = true first thing in the morning... however, in my use case eventhough Last Name is nullable, I will never search for Null Last Names. How can I "tell" linq or the dbcontext to NOT search for Null Last Name (the OR statement)? Alternatively, how can I specify that testVlaue is not nullable? – Bibi Apr 05 '16 at 01:34
  • 1
    @Bibi, that is exactly what turning on `UseDatabaseNullSemantics` does: it forces a straight `WHERE xxx = @param` clause without the additional null checks. – Kirill Shlenskiy Apr 05 '16 at 01:41
  • @KirillShlenskiy I think your last two comments most likely will solve the OP issue, why don't you assemble and drop an answer. – Ivan Stoev Apr 05 '16 at 07:18
  • Thanks for the suggestion @IvanStoev, I've done exactly that. – Kirill Shlenskiy Apr 05 '16 at 11:07
  • @KirillShlenskiy setting `DbContext.Configuration.UseDatabaseNullSemantics = true` worked great. Thank you for your help. – Bibi Apr 05 '16 at 15:27
  • I found [this article](http://code-ninja.org/blog/2014/08/20/entity-framework-performance-how-your-seeks-can-turn-into-scans/) that may shed some extra light in this. – Bibi Apr 05 '16 at 16:08

3 Answers3

3

Theory #1

If you have tested the generated query and determined that it is, in fact, the argument null check that's causing the full table scan, then the fix is pretty simple:

NameDbContext.Configuration.UseDatabaseNullSemantics = true;

This will result in a simplified WHERE clause:

WHERE "Extent2"."LAST_NAME" = :p__linq__0

Obviously, you will need to consider the effect this will have on other queries that use NameDbContext.

Alternatively you can use @IanMercer's highly entertaining solution and perform expression tree node substitution to get the desired WHERE clause. I expect the end result to be similar, although I'm not sure Oracle will be smart enough to produce a reusable query plan without explicit parametrization, which could result in some recompile overheads.

Theory #2

From personal experience (albeit with SQL Server, but since the general concepts are the same I will assume this can apply in your case) there can be another reason for a bypassed index, and that is type mismatch between your LAST_NAME column and the :p__linq__0 parameter. In my scenario the column in the database was non-unicode, but the parameter generated by EF was unicode (varchar vs nvarchar respectively - unicode being the default for EF), making an index seek impossible.

Kirill Shlenskiy
  • 9,367
  • 27
  • 39
  • 1
    Kirill, Theoriy #1: setting the `NameDbContext.Configuration.UseDatabaseNullSemantics = true;` in my situation worked perfectly. I will need to conduct further investigation on the implication on the rest of the application, but thank you for putting me on the right track. – Bibi Apr 05 '16 at 15:31
1

One way to fix this problem is to create a simple ExpressionVisitor that rewrites a parameter on an existing expression to a constant value using partial application.

For example, I create expressions and then apply a value (known only at runtime) to them:

 Expression<Func<int, int, bool>> expr = (a, b) => a < b;
 var applied = expr.Apply(input.FirstMonth);

And here's one of the (many) Apply methods I use (each takes a different number of arguments):

/// <summary>
/// Partially apply a value to an expression
/// </summary>
public static Expression<Func<U, bool>> Apply<T, U>(this Expression<Func<T, U, bool>> input,
    T value)
{
   var swap = new ExpressionSubstitute(input.Parameters[0],
       Expression.Constant(value));
   var lambda = Expression.Lambda<Func<U, bool>>(
       swap.Visit(input.Body), 
       input.Parameters[1]);
   return lambda;
}


class ExpressionSubstitute : System.Linq.Expressions.ExpressionVisitor
{
    private readonly Expression from, to;
    public ExpressionSubstitute(Expression from, Expression to)
    {
        this.from = from;
        this.to = to;
    }

    public override Expression Visit(Expression node)
    {
        if (node == from) return to;
        return base.Visit(node);
    }
}
Ian Mercer
  • 38,490
  • 8
  • 97
  • 133
0

When you create linq queries, you are actually building expressions trees. In your example you have two expressions trees to build your query:

Expression<Func<Name, bool>> exp1 = name => name.LastName == testValue;
Expression<Func<MasterName, bool>> exp2 = masterName => masterName.Names.Any(exp1);
var result = NameDbContext.MasterNames.Where(exp2).ToList();

From this answer Local variable and expression trees:

Capturing a local variable is actually performed by "hoisting" the local variable into an instance variable of a compiler-generated class. The C# compiler creates a new instance of the extra class at the appropriate time, and changes any access to the local variable into an access of the instance variable in the relevant instance.

So the expression tree then needs to be a field access within the instance - and the instance itself is provided via a ConstantExpression.

The simplest approach for working how to create expression trees is usually to create something similar in a lambda expression, then look at the generated code in Reflector, turning the optimization level down so that Reflector doesn't convert it back to lambda expressions.

If I define a local varaible string testValue = "ABC"; the debug view will output:

.Lambda #Lambda1<System.Func`2[ConsoleApp.Program+Name,System.Boolean]>(ConsoleApp.Program+Name $name)
{
    $name.LastName == .Constant<ConsoleApp.Program+<>c__DisplayClass0_0>(ConsoleApp.Program+<>c__DisplayClass0_0).testValue
}

Now if I define a constant const string testValue = "ABC"; the debug view will output:

.Lambda #Lambda1<System.Func`2[ConsoleApp.Program+Name,System.Boolean]>(ConsoleApp.Program+Name $name)
{
    $name.LastName == "ABC"
}
Community
  • 1
  • 1
Thomas
  • 24,234
  • 6
  • 81
  • 125
  • This doesn't seem to answer the question in full, but also: why would you call `Compile` on a simple expression (bypassing all the `IQueryable` goodies and forcing a full load of the table)? – Kirill Shlenskiy Apr 05 '16 at 00:05
  • This explains why there are different behaviors for local variables and constants. I've just split this query so that it is more understandable. Anyway I am not forcing a full load of a table... – Thomas Apr 05 '16 at 00:30
  • would you bet your house on that last statement? I'll reiterate, `DbSet.Where(expr)` and `DbSet.Where(expr.Compile())` result in totally different behaviour. – Kirill Shlenskiy Apr 05 '16 at 00:34
  • I don't have any house ^^ Thanks for that, I don't even know this ^^. I'll edit my post. – Thomas Apr 05 '16 at 00:48