25

I have a question about Entity Framework query execution performance.

Schema:

I have a table structure like this:

CREATE TABLE [dbo].[DataLogger]
(
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [ProjectID] [bigint] NULL,
    CONSTRAINT [PrimaryKey1] PRIMARY KEY CLUSTERED ( [ID] ASC )
)

CREATE TABLE [dbo].[DCDistributionBox]
(
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [DataLoggerID] [bigint] NOT NULL,
    CONSTRAINT [PrimaryKey2] PRIMARY KEY CLUSTERED ( [ID] ASC )
)

ALTER TABLE [dbo].[DCDistributionBox]
    ADD CONSTRAINT [FK_DCDistributionBox_DataLogger] 
    FOREIGN KEY([DataLoggerID]) REFERENCES [dbo].[DataLogger] ([ID])

CREATE TABLE [dbo].[DCString] 
(
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [DCDistributionBoxID] [bigint] NOT NULL,
    [CurrentMPP] [decimal](18, 2) NULL,
    CONSTRAINT [PrimaryKey3] PRIMARY KEY CLUSTERED ( [ID] ASC )
)

ALTER TABLE [dbo].[DCString]
    ADD CONSTRAINT [FK_DCString_DCDistributionBox] 
    FOREIGN KEY([DCDistributionBoxID]) REFERENCES [dbo].[DCDistributionBox] ([ID])

CREATE TABLE [dbo].[StringData]
(
    [DCStringID] [bigint] NOT NULL,
    [TimeStamp] [datetime] NOT NULL,
    [DCCurrent] [decimal](18, 2) NULL,
    CONSTRAINT [PrimaryKey4] PRIMARY KEY CLUSTERED ( [TimeStamp] DESC, [DCStringID] ASC)
)

CREATE NONCLUSTERED INDEX [TimeStamp_DCCurrent-NonClusteredIndex] 
ON [dbo].[StringData] ([DCStringID] ASC, [TimeStamp] ASC)
INCLUDE ([DCCurrent])

Standard indexes on the foreign keys also exist (I don't want to list them all for space reasons).

The [StringData] table as has following storage stats:

  • Data space: 26,901.86 MB
  • Row count: 131,827,749
  • Partitioned: true
  • Partition count: 62

Usage:

I now want to group the data in the [StringData] table and do some aggregation.

I created an Entity Framework query (detailed infos to the query can be found here):

var compareData = model.StringDatas
    .AsNoTracking()
    .Where(p => p.DCString.DCDistributionBox.DataLogger.ProjectID == projectID && p.TimeStamp >= fromDate && p.TimeStamp < tillDate)
    .Select(d => new
    {
        TimeStamp = d.TimeStamp,
        DCCurrentMpp = d.DCCurrent / d.DCString.CurrentMPP
    })
    .GroupBy(d => DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval))
    .Select(d => new
    {
        TimeStamp = d.Key,
        DCCurrentMppMin = d.Min(v => v.DCCurrentMpp),
        DCCurrentMppMax = d.Max(v => v.DCCurrentMpp),
        DCCurrentMppAvg = d.Average(v => v.DCCurrentMpp),
        DCCurrentMppStDev = DbFunctions.StandardDeviationP(d.Select(v => v.DCCurrentMpp))
    })
    .ToList();

The excecution timespan is exceptional long!?

  • Execution result: 92rows
  • Execution time: ~16000ms

Attempts:

I now took a look into the Entity Framework generated SQL query and looks like this:

DECLARE @p__linq__4 DATETIME = 0;
DECLARE @p__linq__3 DATETIME = 0;
DECLARE @p__linq__5 INT = 15;
DECLARE @p__linq__6 INT = 15;
DECLARE @p__linq__0 BIGINT = 20827;
DECLARE @p__linq__1 DATETIME = '06.02.2016 00:00:00';
DECLARE @p__linq__2 DATETIME = '07.02.2016 00:00:00';

SELECT 
1 AS [C1], 
[GroupBy1].[K1] AS [C2], 
[GroupBy1].[A1] AS [C3], 
[GroupBy1].[A2] AS [C4], 
[GroupBy1].[A3] AS [C5], 
[GroupBy1].[A4] AS [C6]
FROM ( SELECT 
    [Project1].[K1] AS [K1], 
    MIN([Project1].[A1]) AS [A1], 
    MAX([Project1].[A2]) AS [A2], 
    AVG([Project1].[A3]) AS [A3], 
    STDEVP([Project1].[A4]) AS [A4]
    FROM ( SELECT 
        DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [K1], 
        [Project1].[C1] AS [A1], 
        [Project1].[C1] AS [A2], 
        [Project1].[C1] AS [A3], 
        [Project1].[C1] AS [A4]
        FROM ( SELECT 
            [Extent1].[TimeStamp] AS [TimeStamp], 
            [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
            FROM    [dbo].[StringData] AS [Extent1]
            INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
            INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
            INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
            WHERE (([Extent4].[ProjectID] = @p__linq__0) OR (([Extent4].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
        )  AS [Project1]
    )  AS [Project1]
    GROUP BY [K1]
)  AS [GroupBy1]

I copied this SQL query into SSMS on the same machine, connected with same connection string as the Entity Framework.

The result is a very much improved performance:

  • Execution result: 92rows
  • Execution time: 517ms

I also do some loop runing test and the result is strange. The test looks like this

for (int i = 0; i < 50; i++)
{
    DateTime begin = DateTime.UtcNow;

    [...query...]

    TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
    Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
}

The result is very different and looks random(?):

0th run: 00:00:11.0618580
1th run: 00:00:11.3339467
2th run: 00:00:10.0000676
3th run: 00:00:10.1508140
4th run: 00:00:09.2041939
5th run: 00:00:07.6710321
6th run: 00:00:10.3386312
7th run: 00:00:17.3422765
8th run: 00:00:13.8620557
9th run: 00:00:14.9041528
10th run: 00:00:12.7772906
11th run: 00:00:17.0170235
12th run: 00:00:14.7773750

Question:

Why is Entity Framework query execution so slow? The resulting row count is really low and the raw SQL query shows a very fast performance.

Update 1:

I take care that its not a MetaContext or Model creation delay. Some other queries are executed on the same Model instance right before with good performance.

Update 2 (related to the answer of @x0007me):

Thanks for the hint but this can be eliminated by changing the model settings like this:

modelContext.Configuration.UseDatabaseNullSemantics = true;

The EF generated SQL is now:

SELECT 
1 AS [C1], 
[GroupBy1].[K1] AS [C2], 
[GroupBy1].[A1] AS [C3], 
[GroupBy1].[A2] AS [C4], 
[GroupBy1].[A3] AS [C5], 
[GroupBy1].[A4] AS [C6]
FROM ( SELECT 
    [Project1].[K1] AS [K1], 
    MIN([Project1].[A1]) AS [A1], 
    MAX([Project1].[A2]) AS [A2], 
    AVG([Project1].[A3]) AS [A3], 
    STDEVP([Project1].[A4]) AS [A4]
    FROM ( SELECT 
        DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [K1], 
        [Project1].[C1] AS [A1], 
        [Project1].[C1] AS [A2], 
        [Project1].[C1] AS [A3], 
        [Project1].[C1] AS [A4]
        FROM ( SELECT 
            [Extent1].[TimeStamp] AS [TimeStamp], 
            [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
            FROM    [dbo].[StringData] AS [Extent1]
            INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
            INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
            INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
            WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
        )  AS [Project1]
    )  AS [Project1]
    GROUP BY [K1]
)  AS [GroupBy1]

So you can see the problem you described is now solved, but the execution time does not change.

Also, as you can see in the schema and the raw execution time, I used optimized structure with high optimized indexer.

Update 3 (related to the answer of @Vladimir Baranov):

I don't see why this can be related to query plan caching. Because in the MSDN is clearly descripted that the EF6 make use of query plan caching.

A simple test proof that the huge excecution time differenz is not related to the query plan caching (phseudo code):

using(var modelContext = new ModelContext())
{
    modelContext.Query(); //1th run activates caching

    modelContext.Query(); //2th used cached plan
}

As the result, both queries run with the same excecution time.

Update 4 (related to the answer of @bubi):

I tried to run the query that is generated by the EF as you descripted it:

int result = model.Database.ExecuteSqlCommand(@"SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [C2], 
    [GroupBy1].[A1] AS [C3], 
    [GroupBy1].[A2] AS [C4], 
    [GroupBy1].[A3] AS [C5], 
    [GroupBy1].[A4] AS [C6]
    FROM ( SELECT 
        [Project1].[K1] AS [K1], 
        MIN([Project1].[A1]) AS [A1], 
        MAX([Project1].[A2]) AS [A2], 
        AVG([Project1].[A3]) AS [A3], 
        STDEVP([Project1].[A4]) AS [A4]
        FROM ( SELECT 
            DATEADD (minute, ((DATEDIFF (minute, 0, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, 0) AS [K1], 
            [Project1].[C1] AS [A1], 
            [Project1].[C1] AS [A2], 
            [Project1].[C1] AS [A3], 
            [Project1].[C1] AS [A4]
            FROM ( SELECT 
                [Extent1].[TimeStamp] AS [TimeStamp], 
                [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                FROM    [dbo].[StringData] AS [Extent1]
                INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
            )  AS [Project1]
        )  AS [Project1]
        GROUP BY [K1]
    )  AS [GroupBy1]",
    new SqlParameter("p__linq__0", 20827),
    new SqlParameter("p__linq__1", fromDate),
    new SqlParameter("p__linq__2", tillDate),
    new SqlParameter("p__linq__5", 15),
    new SqlParameter("p__linq__6", 15));
  • Execution result: 92
  • Execution time: ~16000ms

It took exact as long as the normal EF query!?

Update 5 (related to the answer of @vittore):

I create a traced call tree, maybe it helps:

call tree trace

Update 6 (related to the answer of @usr):

I created two showplan XML via SQL Server Profiler.

Fast run (SSMS).SQLPlan

Slow run (EF).SQLPlan

Update 7 (related to the comments of @VladimirBaranov):

I now run some more test case related to your comments.

First I eleminate time taking order operations by using a new computed column and a matching INDEXER. This reduce the perfomance lag related to DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [TimeStamp] ) / 15* 15, 0). Detail for how and why you can find here.

The Result look s like this:

Pure EntityFramework query:

for (int i = 0; i < 3; i++)
{
    DateTime begin = DateTime.UtcNow;
    var result = model.StringDatas
        .AsNoTracking()
        .Where(p => p.DCString.DCDistributionBox.DataLogger.ProjectID == projectID && p.TimeStamp15Minutes >= fromDate && p.TimeStamp15Minutes < tillDate)
        .Select(d => new
        {
            TimeStamp = d.TimeStamp15Minutes,
            DCCurrentMpp = d.DCCurrent / d.DCString.CurrentMPP
        })
        .GroupBy(d => d.TimeStamp)
        .Select(d => new
        {
            TimeStamp = d.Key,
            DCCurrentMppMin = d.Min(v => v.DCCurrentMpp),
            DCCurrentMppMax = d.Max(v => v.DCCurrentMpp),
            DCCurrentMppAvg = d.Average(v => v.DCCurrentMpp),
            DCCurrentMppStDev = DbFunctions.StandardDeviationP(d.Select(v => v.DCCurrentMpp))
        })
        .ToList();

        TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
        Debug.WriteLine("{0}th run pure EF: {1}", i, excecutionTimeSpan.ToString());
}

0th run pure EF: 00:00:12.6460624

1th run pure EF: 00:00:11.0258393

2th run pure EF: 00:00:08.4171044

I now used the EF generated SQL as a SQL query:

for (int i = 0; i < 3; i++)
{
    DateTime begin = DateTime.UtcNow;
    int result = model.Database.ExecuteSqlCommand(@"SELECT 
        1 AS [C1], 
        [GroupBy1].[K1] AS [TimeStamp15Minutes], 
        [GroupBy1].[A1] AS [C2], 
        [GroupBy1].[A2] AS [C3], 
        [GroupBy1].[A3] AS [C4], 
        [GroupBy1].[A4] AS [C5]
        FROM ( SELECT 
            [Project1].[TimeStamp15Minutes] AS [K1], 
            MIN([Project1].[C1]) AS [A1], 
            MAX([Project1].[C1]) AS [A2], 
            AVG([Project1].[C1]) AS [A3], 
            STDEVP([Project1].[C1]) AS [A4]
            FROM ( SELECT 
                [Extent1].[TimeStamp15Minutes] AS [TimeStamp15Minutes], 
                [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                FROM    [dbo].[StringData] AS [Extent1]
                INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp15Minutes] >= @p__linq__1) AND ([Extent1].[TimeStamp15Minutes] < @p__linq__2)
            )  AS [Project1]
            GROUP BY [Project1].[TimeStamp15Minutes]
        )  AS [GroupBy1];",
    new SqlParameter("p__linq__0", 20827),
    new SqlParameter("p__linq__1", fromDate),
    new SqlParameter("p__linq__2", tillDate));

    TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
    Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
}

0th run: 00:00:00.8381200

1th run: 00:00:00.6920736

2th run: 00:00:00.7081006

and with OPTION(RECOMPILE):

for (int i = 0; i < 3; i++)
{
    DateTime begin = DateTime.UtcNow;
    int result = model.Database.ExecuteSqlCommand(@"SELECT 
        1 AS [C1], 
        [GroupBy1].[K1] AS [TimeStamp15Minutes], 
        [GroupBy1].[A1] AS [C2], 
        [GroupBy1].[A2] AS [C3], 
        [GroupBy1].[A3] AS [C4], 
        [GroupBy1].[A4] AS [C5]
        FROM ( SELECT 
            [Project1].[TimeStamp15Minutes] AS [K1], 
            MIN([Project1].[C1]) AS [A1], 
            MAX([Project1].[C1]) AS [A2], 
            AVG([Project1].[C1]) AS [A3], 
            STDEVP([Project1].[C1]) AS [A4]
            FROM ( SELECT 
                [Extent1].[TimeStamp15Minutes] AS [TimeStamp15Minutes], 
                [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                FROM    [dbo].[StringData] AS [Extent1]
                INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp15Minutes] >= @p__linq__1) AND ([Extent1].[TimeStamp15Minutes] < @p__linq__2)
            )  AS [Project1]
            GROUP BY [Project1].[TimeStamp15Minutes]
        )  AS [GroupBy1]
        OPTION(RECOMPILE);",
    new SqlParameter("p__linq__0", 20827),
    new SqlParameter("p__linq__1", fromDate),
    new SqlParameter("p__linq__2", tillDate));

    TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
    Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
}

0th run with RECOMPILE: 00:00:00.8260932

1th run with RECOMPILE: 00:00:00.9139730

2th run with RECOMPILE: 00:00:01.0680665

Same SQL query excecuted in SSMS (without RECOMPILE):

00:00:01.105

Same SQL query excecuted in SSMS (with RECOMPILE):

00:00:00.902

I hope this are all values you needed.

Community
  • 1
  • 1
Steffen Mangold
  • 1,184
  • 1
  • 20
  • 39
  • Thanks @Eric J. copy-paste error :) – Steffen Mangold Feb 17 '16 at 00:05
  • 3
    If you run the query in EF and then in SSMS, much of the data will be in cache the second time. Make sure you account for that when measuring execution speed (e.g. if you run the EF version again after the SSMS version, is it back to 16 seconds?) – Eric J. Feb 17 '16 at 00:39
  • I did tried it and yes same result. :/ – Steffen Mangold Feb 17 '16 at 00:41
  • 3
    [Slow in the Application, Fast in SSMS](http://www.sommarskog.se/query-plan-mysteries.html) will explain why you see this difference in performance. – Vladimir Baranov Feb 17 '16 at 01:01
  • I don't thought that this is the problem because query plan caching is on by default in EF. That means that execute the same query two times the second has to be much fast, but it is not. Please see [MSDN](https://msdn.microsoft.com/en-us/data/hh949853.aspx) – Steffen Mangold Feb 17 '16 at 01:11
  • 2
    @SteffenMangold, The query that is run by EF has its own plan. The "same" query that you run in SSMS will have its own plan, which is likely to be different. Both of these plans will be cached and reused by SQL Server engine if possible. The article explains in detail how to deal with it. I personally use `OPTION(RECOMPILE)` with queries of this kind, i.e. those that have [Dynamic Search Conditions](http://www.sommarskog.se/dyn-search.html), like you have: `([Extent4].[ProjectID] = @p__linq__0) OR (@p__linq__0 IS NULL)`. – Vladimir Baranov Feb 17 '16 at 02:21
  • @Vladimir Baranov with *"execute the same query two times"* I mean two time inside EF on the same ModelContext instance. So EF should store the excecution plan according to MSDN. – Steffen Mangold Feb 17 '16 at 02:41
  • 2
    @SteffenMangold, I was talking about [execution plan in the SQL Server engine](https://www.simple-talk.com/sql/performance/execution-plan-basics/). I don't know what is meant by EF execution plan. – Vladimir Baranov Feb 17 '16 at 02:49
  • 1
    The only difference is the materialization process. Over 92 rows it should take few ms, probably less. You could try to run the query with same EF connection without the materialization process using `modelContext.Database.ExecuteSqlCommand(...)` – bubi Feb 19 '16 at 07:39
  • Please take a look at my detailed response on update 4 (above) @bubi. – Steffen Mangold Feb 19 '16 at 15:45
  • 1
    Did you try to run dottrace or similar to see what that time is actually used for ? It might've been EF bug – vittore Feb 19 '16 at 16:16
  • I created a call tree, please see above @vittore. – Steffen Mangold Feb 19 '16 at 21:30
  • 3
    Did you really read the link (http://www.sommarskog.se/query-plan-mysteries.html)? There are a lot of reasons given there. Differing session settings are the prime reason for this problem. Do this: Capture the execution plans of slow and fast version using SQL Profiler and the XML Showplan event. Post both plans as files somewhere (some file hoster). – usr Feb 19 '16 at 21:55
  • 1
    Just to compare what is going on , can you change code to `await ... .ToListAsync()` and compare to what you have ? – vittore Feb 19 '16 at 21:55
  • 1
    @vittore async does not change what the database does. The DB cannot even find out that you did it even if it wanted to. – usr Feb 19 '16 at 21:56
  • 1
    Also, why did you partition DCString and on what scheme? Often, people partition when they shouldn't. That's why I'm asking. – usr Feb 19 '16 at 21:58
  • 1
    @usr I suspect it is not database issue and has something to do with tds query execution in System.Data internals – vittore Feb 19 '16 at 21:59
  • 1
    @vittore alright, that's a reasonable hypothesis but the profiler picture shows almost all time spent waiting for SQL Server. – usr Feb 19 '16 at 22:02
  • 1
    @usr and that is what concern me, other people report the same issue when there is another request to the same db is performed at the same time, also some people report that it was only happening with remote server but not a local one, and issue disappears after some (unknown to the user) network reconfiguration – vittore Feb 19 '16 at 22:05
  • 1
    @usr we had not exactly the same but imo related issue when exception was thrown because of timeout exactly in this place, when connecting to remote server, and was fine for local one, but after several windows/sql update it disappear as well, unfortunately i still have not figured out which updates did it. – vittore Feb 19 '16 at 22:07
  • @usr I created both Showplans, please see Update 6. many thanks for all your great help so far. :) – Steffen Mangold Feb 19 '16 at 23:05
  • 1
    Let's see if this is a parameter sniffing problem. Put DBCC FREEPROCCACHE before the query in SSMS and in the raw SQL version with EF. See if performance is now the same. It should be. – usr Feb 19 '16 at 23:34
  • I tried it with no difference. But what I notice is that if I wrote the variable diect into the SQL query, like `WHERE ([Extent4].[ProjectID] = 20827)` not like with DECLARE and `WHERE ([Extent4].[ProjectID] = @p__linq__0)`. Then the SSMS query get as slow as the EF query. – Steffen Mangold Feb 19 '16 at 23:56
  • I now really think that it is a cache think. As I see from the queryplan the most time is taken by the ORDER from the 15min grouping. this should be slow in every case (for sure never under 1sec). If I take this out of the query both cases run the same speed. So I don't think it is some Showplan cache, but some kind of caching of a index from the grouping statement. Is this possible? – Steffen Mangold Feb 20 '16 at 00:06
  • I think Vladimir Baranov, Eric J., usr, bubi or vittore shut write a answer. I can do it also as a complete conclusion, but I only come to it via help of you guy and one of you deserve the bounty. – Steffen Mangold Feb 20 '16 at 00:25
  • 1
    It looks like you completely missed the point of the [Erland's article](http://www.sommarskog.se/query-plan-mysteries.html). OK. Simple test. Add the line `OPTION(RECOMPILE)` to the query. I don't know how to make EF add this line to the generated code. Try to add it to the variant that you tried in `update 4`. With `OPTION(RECOMPILE)` I expect that both queries (from SSMS and from the EF application) would run for the same time. It could be slow, it could be fast, but it should be the same time. – Vladimir Baranov Feb 21 '16 at 11:51
  • If I run the query in SSMS with `OPTION (OPTIMIZE FOR UNKNOWN, RECOMPILE);` it runs in under 1sec (fast as without). But the query timespan change dramaticly if I run `DBCC DROPCLEANBUFFERS;` first. I don't know if this is important @VladimirBaranov!? – Steffen Mangold Feb 22 '16 at 02:59
  • 1
    @SteffenMangold, I was focused on the difference in run time between SSMS and your EF application, as stated in the question title. So, if you run the **same** query with `OPTION(RECOMPILE)`, not some other option, but just `RECOMPILE`, does it take the same time for the query to run both in SSMS and in EF application? – Vladimir Baranov Feb 22 '16 at 03:06
  • No running it with or wothout brings now differnet in both scenarios. – Steffen Mangold Feb 22 '16 at 03:32
  • Please look at my detailed Update 7 @VladimirBaranov I nope there are all cases you needed listed. Thanks thousand times for your great support! – Steffen Mangold Feb 22 '16 at 10:04

4 Answers4

18

In this answer I'm focusing on the original observation: the query generated by EF is slow, but when the same query is run in SSMS it is fast.

One possible explanation of this behaviour is Parameter sniffing.

SQL Server uses a process called parameter sniffing when it executes stored procedures that have parameters. When the procedure is compiled or recompiled, the value passed into the parameter is evaluated and used to create an execution plan. That value is then stored with the execution plan in the plan cache. On subsequent executions, that same value – and same plan – is used.

So, EF generates a query that has few parameters. The first time you run this query the server creates an execution plan for this query using values of parameters that were in effect in the first run. That plan is usually pretty good. But, later on you run the same EF query using other values for parameters. It is possible that for new values of parameters the previously generated plan is not optimal and the query becomes slow. The server keeps using the previous plan, because it is still the same query, just values of parameters are different.

If at this moment you take the query text and try to run it directly in SSMS the server will create a new execution plan, because technically it is not the same query that is issued by EF application. Even one character difference is enough, any change in the session settings is also enough for the server to treat the query as a new one. As a result the server has two plans for the seemingly same query in its cache. The first "slow" plan is slow for the new values of parameters, because it was originally built for different parameter values. The second "fast" plan is built for the current parameter values, so it is fast.

The article Slow in the Application, Fast in SSMS by Erland Sommarskog explains this and other related areas in much more details.

There are several ways to discard cached plans and force the server to regenerate them. Changing the table or changing the table indexes should do it - it should discard all plans that are related to this table, both "slow" and "fast". Then you run the query in EF application with new values of parameters and get a new "fast" plan. You run the query in SSMS and get a second "fast" plan with new values of parameters. The server still generates two plans, but both plans are fast now.

Another variant is adding OPTION(RECOMPILE) to the query. With this option the server would not store the generated plan in its cache. So, every time the query runs the server would use actual parameter values to generate the plan that (it thinks) would be optimal for the given parameter values. The downside is an added overhead of the plan generation.

Mind you, the server still could choose a "bad" plan with this option due to outdated statistics, for example. But, at least, parameter sniffing would not be a problem.


Those who wonder how to add OPTION (RECOMPILE) hint to the query that is generated by EF have a look at this answer:

https://stackoverflow.com/a/26762756/4116017

Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
  • 2
    Thanks @VladimirBaranov! I can 100% confirm you hint about parameter sniffing. One simple test: I searched for all excecution plans of my EF query and deleted then like descripted [here](http://stackoverflow.com/a/30444176/1112048). After this I run the EF query first and it get as fast as I would excpect it (<1sec). I think you bring up the correct answer. I would recomend to all people who run into the same problem to support this EF [UserVoice](https://data.uservoice.com/forums/72025-entity-framework-feature-suggestions/suggestions/6566335-support-for-option-recompile-on-generated-querie). – Steffen Mangold Feb 22 '16 at 13:27
  • 1
    I agree that this is the best answer, so I'm happy to add my vote here as well as the User Voice. Happy coding to everyone! – Ivan Stoev Feb 22 '16 at 13:50
  • @SteffenMangold, Thank you. I'm glad you found this answer useful. I never used EF, so I don't know what is the proper EF-style way to add this option to the generated query. At least, now it should be easier for you to figure out what is going on and what to do if you see such problem. – Vladimir Baranov Feb 22 '16 at 21:40
  • 1
    Thank you so much for the question @SteffenMangold and for the answer @VladimirBaranov! We have the same behavior and were struggling for months to pinpoint the problem. But adding the RECOMPILE to the query did the trick. – Martin Aug 29 '22 at 06:32
2

I know I'm a bit late here, but since I've participated in the building of the query in question, I feel obliged to take some action.

The general problem I see with Linq to Entities queries is that the typical way we build them introduces unnecessary parameters, which may affect the cached database query plan (so called Sql Server parameter sniffing problem).

Let take a look at your query group by expression

d => DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval)

Since minuteInterval is a variable (i.e. non constant), it introduces a parameter. Same for DateTime.MinValue (note that the primitive types expose similar things as constants, but for DateTime, decimal etc. they are static readonly fields which makes a big diference how they are treated inside the expressions).

But regardless of how it's represented in the CLR system, DateTime.MinValue logically is a constant. What about minuteInterval, it depends on your usage.

My attempt to solve the issue would be to eliminate all the parameters related to that expression. Since we cannot do that with compiler generated expression, we need to build it manually using System.Linq.Expressions. The later is not intuitive, but fortunately we can use a hybrid approach.

First, we need a helper method which allows us to replace expression parameters:

public static class ExpressionUtils
{
    public static Expression ReplaceParemeter(this Expression expression, ParameterExpression source, Expression target)
    {
        return new ParameterReplacer { Source = source, Target = target }.Visit(expression);
    }

    class ParameterReplacer : ExpressionVisitor
    {
        public ParameterExpression Source;
        public Expression Target;
        protected override Expression VisitParameter(ParameterExpression node)
        {
            return node == Source ? Target : base.VisitParameter(node);
        }
    }
}

Now we have everything needed. Let encapsulate the logic inside a custom method:

public static class QueryableUtils
{
    public static IQueryable<IGrouping<DateTime, T>> GroupBy<T>(this IQueryable<T> source, Expression<Func<T, DateTime>> dateSelector, int minuteInterval)
    {
        Expression<Func<DateTime, DateTime, int, DateTime>> expr = (date, baseDate, interval) =>
            DbFunctions.AddMinutes(baseDate, DbFunctions.DiffMinutes(baseDate, date) / interval).Value;
        var selector = Expression.Lambda<Func<T, DateTime>>(
            expr.Body
            .ReplaceParemeter(expr.Parameters[0], dateSelector.Body)
            .ReplaceParemeter(expr.Parameters[1], Expression.Constant(DateTime.MinValue))
            .ReplaceParemeter(expr.Parameters[2], Expression.Constant(minuteInterval))
            , dateSelector.Parameters[0]
        );
        return source.GroupBy(selector);
    }
}

Finally, replace

.GroupBy(d => DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval))

with

.GroupBy(d => d.TimeStamp, minuteInterval * minuteInterval)

and the generated SQL query would be like this (for minuteInterval = 15):

SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [C2], 
    [GroupBy1].[A1] AS [C3], 
    [GroupBy1].[A2] AS [C4], 
    [GroupBy1].[A3] AS [C5], 
    [GroupBy1].[A4] AS [C6]
    FROM ( SELECT 
        [Project1].[K1] AS [K1], 
        MIN([Project1].[A1]) AS [A1], 
        MAX([Project1].[A2]) AS [A2], 
        AVG([Project1].[A3]) AS [A3], 
        STDEVP([Project1].[A4]) AS [A4]
        FROM ( SELECT 
            DATEADD (minute, (DATEDIFF (minute, convert(datetime2, '0001-01-01 00:00:00.0000000', 121), [Project1].[TimeStamp])) / 225, convert(datetime2, '0001-01-01 00:00:00.0000000', 121)) AS [K1], 
            [Project1].[C1] AS [A1], 
            [Project1].[C1] AS [A2], 
            [Project1].[C1] AS [A3], 
            [Project1].[C1] AS [A4]
            FROM ( SELECT 
                [Extent1].[TimeStamp] AS [TimeStamp], 
                [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                FROM    [dbo].[StringDatas] AS [Extent1]
                INNER JOIN [dbo].[DCStrings] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                INNER JOIN [dbo].[DCDistributionBoxes] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                INNER JOIN [dbo].[DataLoggers] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
            )  AS [Project1]
        )  AS [Project1]
        GROUP BY [K1]
    )  AS [GroupBy1]

As you may see, we successfully eliminated some of the query parameters. Will that help? Well, as with any database query tuning, it might or might not. You need to try and see.

Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
1

The DB engine determines the plan for each query based on how it is called. In case of your EF Linq query, the plan is prepared in such a way that each input parameter is treated as an unknown(since you have no idea what's coming in). In your actual query, you have all you parameters as part of the query so it will run under a different plan than that for a parameterized one. One of the affected piece that I see immediately is

...(@p__linq__0 IS NULL)..

This is FALSE since p_linq_0 = 20827 and is NOT NULL, so your first half of the WHERE is FALSE to begin with and does not need to be looked at any more. In case of LINQ queries, the DB has no idea what's coming in so evaluates everything anyway.

You'll need to see if you can use indices or other techniques to make this run faster.

Milan
  • 138
  • 2
  • 12
1

When EF runs the query, it wraps it and runs it with sp_executesql, which means the execution plan will be cached in the stored procedure execution plan cache. Due to differences (parameter sniffing etc) in how the raw sql statement vs the SP version have their execution plans built, the two can differ.

When running the EF (sp wrapped) version, SQL server is most likely using a more generic execution plan that covers a wider range of timestamps than the values you are actually passing in.

That said, to reduce the chance of SQL server trying something "funny" with hash joins etc, the first things I would do are:

1) Index the columns used in the where clause, and in joins

create index ix_DataLogger_ProjectID on DataLogger (ProjectID);
create index ix_DCDistributionBox_DataLoggerID on DCDistributionBox (DataLoggerID);
create index ix_DCString_DCDistributionBoxID on DCString (DCDistributionBoxID);

2) Do explicit joins in the Linq query to eliminate the or ProductID is null part

KristoferA
  • 12,287
  • 1
  • 40
  • 62
  • Thanks @KristoferA. **1)** I allready solved this, but forget to copy the INDEXER into my question schema (add it now) **2)** Please see **Update 2** also solved this – Steffen Mangold Feb 17 '16 at 00:34
  • These standard INDEXER on the for FOREIGN KEYs also exist. Sorry I don't thought I have to mention them. – Steffen Mangold Feb 17 '16 at 00:50
  • Actually running the identical query `exec sp_executesql N'...query...'` that EF runs is STILL dramatically faster via SSMS than via EF. Does running it that way not bypass any execution plan differences? – Mir Sep 18 '17 at 23:33