10

I was given the following query in SQL (variable names obfuscated) which is trying to get the values (Ch, Wa, Bu, Hi) resulting in the greatest number (cnt) of Pi entries.

select top 1 Pi.Ch, Pi.Wa, Pi.Bu, Pi.Hi, COUNT(1) as cnt 
from Product, Si, Pi
where Product.Id = Si.ProductId
and Si.Id = Pi.SiId
and Product.Code = @CodeParameter
group by Pi.Ch, Pi.Wa, Pi.Bu, Pi.Hi
order by cnt desc

which runs instantly in SQL management studio on our production database. I've successfully written the code a few ways in C# LINQ and Entity Framework, but every way the code runs in 8 - 10 seconds. One attempt is the following code (doing it without the print as one call gives the same performance results):

using(var context = new MyEntities()){
    var query = context.Products
        .Where(p => p.Code == codeFromFunctionArgument)
        .Join(context.Sis, p => p.Id, s => s.ProductId, (p, s) => new { sId = s.Id })
        .Join(context.Pis, ps => ps.sId, pi => pi.SiId, (ps, pi) => new {pi.Ch, pic.Wa, pic.Bu, pic.Hi})
        .GroupBy(
            pi => pi,
            (k, g) => new MostPisResult()
            {
                Ch = k.Ch,
                Wa = k.Wa,
                Bu = k.Bu,
                Hi = k.Hi,
                Count = g.Count()
            }
        )
        .OrderByDescending(x => x.Count);
        Console.WriteLine(query.ToString());
        return query.First();
    }
}

which outputs the following SQL statements:

SELECT 
    [Project1].[C2] AS [C1], 
    [Project1].[Ch] AS [Ch], 
    [Project1].[Wa] AS [Wa], 
    [Project1].[Bu] AS [Bu], 
    [Project1].[Hi] AS [Hi], 
    [Project1].[C1] AS [C2]
    FROM ( SELECT 
        [GroupBy1].[A1] AS [C1], 
        [GroupBy1].[K1] AS [Ch], 
        [GroupBy1].[K2] AS [Wa], 
        [GroupBy1].[K3] AS [Bu], 
        [GroupBy1].[K4] AS [Hi], 
        1 AS [C2]
        FROM ( SELECT 
            [Extent3].[Ch] AS [K1], 
            [Extent3].[Wa] AS [K2], 
            [Extent3].[Bu] AS [K3], 
            [Extent3].[Hi] AS [K4], 
            COUNT(1) AS [A1]
            FROM   [dbo].[Product] AS [Extent1]
            INNER JOIN [dbo].[Si] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ProductId]
            INNER JOIN [dbo].[Pi] AS [Extent3] ON [Extent2].[Id] = [Extent3].[SiId]
            WHERE ([Extent1].[Code] = @p__linq__0) AND (@p__linq__0 IS NOT NULL)
            GROUP BY [Extent3].[Ch], [Extent3].[Wa], [Extent3].[Bu], [Extent3].[Hi]
        )  AS [GroupBy1]
    )  AS [Project1]
    ORDER BY [Project1].[C1] DESC

I've also tried in query syntax with about the same result. I also tried (but not for very long) executing the original SQL query directly with EF, but couldn't quickly get it working.

Is there some mistake I'm doing in translating the query to LINQ? Is there an obvious way I'm missing to improve the query? Is it possible to write the query in EF / LINQ with the same performance as the SQL statements?

====== Update ======

In SQL profiler the output for the original query is exactly the same. For the LINQ query it is very similar to what I posted above.

exec sp_executesql N'SELECT TOP (1) 
    [Project1].[C2] AS [C1], 
    [Project1].[Ch] AS [Ch], 
    [Project1].[Wa] AS [Wa], 
    [Project1].[Bu] AS [Bu], 
    [Project1].[Hi] AS [Hi], 
    [Project1].[C1] AS [C2]
    FROM ( SELECT 
        [GroupBy1].[A1] AS [C1], 
        [GroupBy1].[K1] AS [Ch], 
        [GroupBy1].[K2] AS [Wa], 
        [GroupBy1].[K3] AS [Bu], 
        [GroupBy1].[K4] AS [Hi], 
        1 AS [C2]
        FROM ( SELECT 
            [Extent3].[Ch] AS [K1], 
            [Extent3].[Wa] AS [K2], 
            [Extent3].[Bu] AS [K3], 
            [Extent3].[Hi] AS [K4], 
            COUNT(1) AS [A1]
            FROM   [dbo].[Product] AS [Extent1]
            INNER JOIN [dbo].[Si] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ProductId]
            INNER JOIN [dbo].[Pi] AS [Extent3] ON [Extent2].[Id] = [Extent3].[SiId]
            WHERE ([Extent1].[Code] = @p__linq__0) AND (@p__linq__0 IS NOT NULL)
            GROUP BY [Extent3].[Ch], [Extent3].[Wa], [Extent3].[Bu], [Extent3].[Hi]
        )  AS [GroupBy1]
    )  AS [Project1]
    ORDER BY [Project1].[C1] DESC',N'@p__linq__0 nvarchar(4000)',@p__linq__0=N'109579'

====== Update 2 ======

Here's the obfuscated XML output of the query execution plan on Snipt.org. Note the variable in question here is named "MagicalCode" in the output and both values "109579" and "2449-268-550" are valid (strings in C#) as in the final line of the XML output.

<ParameterList>
    <ColumnReference
        Column="@p__linq__0"
        ParameterCompiledValue="N'109579'"
        ParameterRuntimeValue="N'2449-268-550'" />
</ParameterList>

Plan image with actual row counts displayed

Plan

====== Update 3 ======

(hidden in a comment) I ran the EF generated SQL from entity framework in SSMS and it ran instantly. So I might be suffering from some form of parameter sniffing as hinted by this question. I'm not sure how to deal with it in the context of entity framework.

====== Update 4 ======

Updated Entity Framework SQL Execution Plan and SSMS SQL Query Execution Plan that can be opened with Plan Explorer.

====== Update 5 ======

Some workaround attempts

  • Running the original query using context.Database.SqlQuery<ReturnObject>(...) ran in ~4-5 seconds.
  • Running the original query using SqlCommand and the connection string obtained from EF context took about 3 seconds (context initialization overhead).
  • Running the original query using SqlCommand took with hardcoded connection string takes about 1.5 seconds. So I ended up using the last one for now. The last thing I can think of is writing a stored procedure to get closer to the "instant" performance of running the query in SSMS.
Community
  • 1
  • 1
craastad
  • 6,222
  • 5
  • 32
  • 46
  • The linq generated query doesn't have `TOP 1`? Please post the actual execution plans anyway. – Martin Smith Dec 16 '13 at 10:35
  • 1
    @MartinSmith `query.ToString()` is executed before `query.First()`, so it won't show `TOP 1` – ken2k Dec 16 '13 at 10:40
  • When do you start the watch when using EF (console app, win app, web app)? How long does the generated query takes in SSMS ? – tschmit007 Dec 16 '13 at 10:41
  • Capture the final query (including the TOP 1) with SQL Profiler. Please post both execution plans. The problem will be apparent then. – usr Dec 16 '13 at 10:47
  • I updated the output from SQL profiler. It doesn't give me any different output for the original query (I've also tried LinqPad for this, if anyone knows how I can get more useful output, let me know). – craastad Dec 16 '13 at 11:59
  • 1
    The execution plans as well, please. – usr Dec 16 '13 at 12:07
  • 1
    What is the datatype of `Product.Code`? If `varchar` you will be [getting an implicit cast](http://stackoverflow.com/questions/7637907/query-extremely-slow-in-code-but-fast-in-ssms/7638059#7638059) that will prevent seeking on code. – Martin Smith Dec 16 '13 at 12:19
  • The execution plans were virtually identical, with costs in the same step PK_Pi lookup (so no point in posting them). Running the generated LINQ query in sql management actually ran in less than a second. Yes, code is an nvarchar(4000), so the problem must be related to what @MartinSmith posted, I'll look into that. – craastad Dec 16 '13 at 12:35
  • @craastad - That's only one possible suggestion. Parameter sniffing would be another. If you post the XML for the actual execution plan for the slow query we should see where the problem is. – Martin Smith Dec 16 '13 at 12:40
  • That plan doesn't open correctly in SSMS. I think you mangled it when obfuscating it. – Martin Smith Dec 16 '13 at 13:26
  • Opens in plan explorer OK though. Also can you let us know the output of `SET STATISTICS IO ON;SET STATISTICS TIME ON;` then running the slow query? And one other thing to try. Whilst the slow plan is executing poll `sys.dm_os_waiting_tasks` and see what wait types that session is getting? – Martin Smith Dec 16 '13 at 13:41
  • If the execution plans were in fact "nearly identical" we'd see the same performance. I therefore think that they differ in some important detail. Please post both. – usr Dec 16 '13 at 15:55
  • Though unless every single key lookup/seek is resulting in a random IO from slow disc I don't really see how that plan runs that slowly. Hence request for the other information. – Martin Smith Dec 16 '13 at 16:14
  • it might be because of way EF run SQL queries. EF uses sp_exectuesql to run queries which results in such issue. – Shoaib Shaikh Dec 16 '13 at 16:51
  • Following your update the plans look basically the same except for a slightly smaller memory grant for the EF one (1984 vs 1936). Any news on the wait and IO stats? – Martin Smith Dec 16 '13 at 17:13
  • I couldn't figure out how to get WAIT and IO stats in context of entity framework (i tried naively running the commands using context.Database.executeSql, but didn't see any reasonable output). – craastad Dec 17 '13 at 15:49
  • @craastad - Can you repro the same poor performance in management studio if you copy and paste the same `sp_executesql` command? – Martin Smith Dec 17 '13 at 16:18
  • @MartinSmith Nope, it's always snappy when ran in Management Studio. – craastad Dec 18 '13 at 09:44
  • So maybe the time is spent by the client as Mike Peterson suggests. My next step would be to use SQL Profiler/SQL Trace/Extended Events or the DMVs to compare the reported EF workload on the server vs that reported when running the same query in management studio (logical reads, writes etc.). And if they look about the same see if the spid is spending a lot of time waiting on [`ASYNC_NETWORK_IO`](http://stackoverflow.com/a/18365792/73226) – Martin Smith Dec 18 '13 at 11:14
  • Are you using custom `SET` options? This is visible in the XML plans. Compare the options that are set. Are they exactly identical? – usr Dec 21 '13 at 20:27

2 Answers2

4

You could try using IQueryable.AsNoTracking() see http://msdn.microsoft.com/en-us/library/gg679352(v=vs.103).aspx. It is safe to use AsNoTracking() in cases where you are not going to edit the results and save them back to the database again. Usually it makes a big difference when a Query returns a large number of rows. Make sure you put System.Data.Entity in your uses if you want to use .AsNoTracking()

Mike Peterson
  • 1,109
  • 7
  • 9
1

It could be a problem of cached execution plan. Try execute stored procedure to clean query execution plans:

DBCC FREEPROCCACHE

Also this thread might be helpful: Entity Framework cached query plan performance degrades with different parameters

Community
  • 1
  • 1