0

We are using entity framework to work with a table with 50 million entries (at a Microsoft SQL Server).

public async Task<List<TableName>> MyMethod(int fooId, int count)
{
    using (var context = new Context(ConnectionString))
    {
        return
            await context.TableName.AsNoTracking()
                .Where(_ => _.FooId== fooId)
                .OrderByDescending(_ => _.DateCreated)
                .Take(count)
                .ToListAsync();
    }
}

Entity Framework translates this to (beautified):

declare @fooId int = 42
SELECT TOP (100) *
FROM TableName
WHERE FooId = @fooId
ORDER BY DateCreated DESC

The columns FooId and DateCreated both got an index, but SQL Server does a full table scan anyway, which takes very long. This is caused because 42 is assigned outside of the statement (and seems to be related to selectivity). The indexes would be used if you wrote

WHERE FooId = 42

Is there a way to get Entity Framework to optimize the generated query? At the moment, my only way seems to be using raw SQL queries for huge tables in my code :-/ Is there a better workaround?

Edit: More details as requested in the comments: Non beautified query that is generated by entity framework:

SELECT TOP (100) 
    [Project1].[DateCreated] AS [DateCreated], 
    [Project1].[FooId] AS [FooId]
    FROM ( SELECT 
        [Extent1].[DateCreated] AS [DateCreated], 
        [Extent1].[FooId] AS [FooId]
        FROM [dbo].[TableName] AS [Extent1]
        WHERE [Extent1].[FooId] = @p__linq__0
    )  AS [Project1]
    ORDER BY [Project1].[DateCreated] DESC

-- p__linq__0: '42' (Type = Int32, IsNullable = false) 

The create script for the index:

CREATE NONCLUSTERED INDEX [IX_TableName_FooId] ON [dbo].[TableName]
(
    [FooId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 85) ON [SECONDARY]
GO

Create script for my table:

CREATE TABLE [dbo].[TableName](
    [DateCreated] [datetime] NOT NULL,
    [FooId] [int] NULL
) ON [PRIMARY]

Execution Plan

TheWho
  • 485
  • 3
  • 17
  • 2
    Please show index definition. – Sergey Kalinichenko Nov 22 '17 at 09:31
  • 1
    Just to clarify: have you tried hard coding the value and confirmed that indexes are used? How many unique values of `FooId` are there? Is there any chance you ran this statement against an empty table then populated the table? – Nick.Mc Nov 22 '17 at 09:34
  • 1
    Can you post here ACTUAL execution plan of this query? – sepupic Nov 22 '17 at 10:14
  • I added the exact query and the index create script, hope it helps. – TheWho Nov 22 '17 at 10:42
  • No, it did not make a difference, if I move AsNoTracking() after Take. There are about 50 different entries for FooId = 42 in the database. – TheWho Nov 22 '17 at 10:45
  • @Nick.McDermaid: I can reproduce the problem with my generated query in Management Studio. If I replace the variable there with the hard coded value it works like a charm. – TheWho Nov 22 '17 at 10:49
  • `I can reproduce the problem with my generated query in Management Studio` When you say 'reproduce the problem' what do you mean exactly? Do you mean it is slower? Or you have examined the execution plan? Or something else? Does the problem occur if you **remove** `.AsNoTracking()`? – mjwills Nov 22 '17 at 11:01
  • I added the excution plan and the create statement – TheWho Nov 22 '17 at 11:06
  • With 'reproducing the problem in Management Studio', I mean that the query takes a few mili seconds when executed with a hard coded value and over a minute when using a variable. – TheWho Nov 22 '17 at 11:08
  • Does updating statistics or clearing the cache help? https://technet.microsoft.com/en-us/library/ms187348(v=sql.110).aspx https://stackoverflow.com/questions/1873025/how-can-i-clear-the-sql-server-query-cache – mjwills Nov 22 '17 at 11:43
  • >>>I added the excution plan and the create statement <<< This is not a plan. It's hust a picture where no one can see operators properties. Actual plan should be placed here: https://www.brentozar.com/pastetheplan/ – sepupic Nov 22 '17 at 13:46
  • 1
    Did you try to add option(recompile) to your query? – sepupic Nov 22 '17 at 13:46
  • Updateing the statistics and clearing the cache did not help. – TheWho Nov 24 '17 at 08:37
  • We are experimenting with recompile at the moment, this seems to work :-) – TheWho Nov 24 '17 at 08:38

2 Answers2

1

It's a problem with parameter sniffing.

Your query will be executed by the stored procedure which gets an input parameter:

EXEC sp_executesql N'SELECT TOP (100) *
FROM TableName
WHERE FooId = = @p__linq__0
ORDER BY DateCreated DESC', ' @p__linq__0 int', @p__linq__0 = 42

You should change the query to change the query parameter to a local variable:

EXEC sp_executesql N'
    DECLARE @pp__linq__0 int = @p__linq__0
    SELECT TOP (100) *
    FROM TableName
    WHERE FooId = = @pp__linq__0
    ORDER BY DateCreated DESC', ' @p__linq__0 int', @p__linq__0 = 42

For example our solution for this issue:

public class ParamPositioningInterceptor : DbCommandInterceptor
{
    public override void ReaderExecuting(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
    {
        if (command.CommandText.StartsWith("SELECT") && command.Parameters.Count > 0)
        {
            StringBuilder sb1 = new StringBuilder(command.CommandText);
            StringBuilder sb2 = new StringBuilder();

            SqlParameter[] array = new SqlParameter[command.Parameters.Count];
            command.Parameters.CopyTo(array, 0);

            foreach (SqlParameter p in array.OrderByDescending(x => x.ParameterName.Length))
            {
                sb1.Replace("@" + p, "@p" + p);

                switch (p.SqlDbType)
                {
                    case SqlDbType.Char:
                    case SqlDbType.VarChar:
                    case SqlDbType.NChar:
                    case SqlDbType.NVarChar:
                        sb2.AppendFormat("DECLARE @p{0} {1}({2}) = @{0}", p, p.SqlDbType, p.Size);
                        break;
                    case SqlDbType.Decimal:
                        sb2.AppendFormat("DECLARE @p{0} {1}({2},{3}) = @{0}", p, p.SqlDbType, p.Precision, p.Scale);
                        break;
                    default:
                        sb2.AppendFormat("DECLARE @p{0} {1} = @{0}", p, p.SqlDbType);
                        break;
                }

                sb2.AppendLine();
            }

            command.CommandText = sb2.Append(sb1).ToString();
        }
        //
        base.ReaderExecuting(command, interceptionContext);
    }
}
Denis Rubashkin
  • 2,151
  • 1
  • 9
  • 14
  • Too bad I can only accept one answer. We added a recompile like @sepupic explained with a DBInterceptor, similar to your answer. This is working like a charm and the query takes a few milliseconds instead of minutes. – TheWho Dec 04 '17 at 11:21
0

If your code is executed exactly like this:

declare @fooId int = 42
SELECT TOP (100) *
FROM TableName
WHERE FooId = @fooId
ORDER BY DateCreated DESC

i.e. has no parameters but local variables only, the estimated number of rows is C^1/2 (C = table cardinality) with non unique value. It will imply full scan.

You can try to pass the real value by using option(recompile).

This will cause your statement to be recompiled when the variable is already assigned, i.e. execution plan will consider a value that you effectivly passed.

To see what is actually executed and what was the estimation and real number of rows you should provide the actual execution plan.

TheWho
  • 485
  • 3
  • 17
sepupic
  • 8,409
  • 1
  • 9
  • 20
  • We added a recompile like you explained with a DBInterceptor, similar to @Denis Rubashkins answer. This is working like a charm and the query takes a few milliseconds instead of minutes. – TheWho Dec 04 '17 at 11:20