Variables make query performance worse

Question

I have this query and the performance slows down drastically when I declare variables:

DECLARE @StartDate DATETIME,
        @EndDate   DATETIME

SET @StartDate = '2018-08-13'
SET @EndDate   = '2018-08-19'

SELECT *
FROM [DIME_WH].[dbo].[FactOrderLines2] FL (nolock)
WHERE DD_OrderDate >= @StartDate
  AND DD_OrderDate <= @EndDate

This is much slower than this SQL statement:

SELECT *
FROM [DIME_WH].[dbo].[FactOrderLines2] FL (nolock)
WHERE DD_OrderDate >= '2018-08-01'
  AND DD_OrderDate <= '2018-08-17'

Both queries will return the same results in the end.

@AlvinWi Do you need to select all of the columns from this table? Could you show the schema and the indexes on the table? — gotqn, Aug 20 '18 at 04:57
@gotqn I am not sure how I could show you the schema and indexes here? Not in actual query - it's just for my example which compare both queries performance — AlvinWi, Aug 20 '18 at 05:00
Sure, but if I know the final goal, I might be able to advice you what kind of indexes to create. For example, it matters if we want to count the rows, or if you extract always particular rows. You can try at least to create index on the `DB_OrderDate` column. — gotqn, Aug 20 '18 at 05:03

KumarHarsh · Accepted Answer · 2018-08-20T08:59:03.730

3

SELECT * FROM [DIME_WH].[dbo].[FactOrderLines2] FL (nolock)
WHERE DD_OrderDate >= '2018-08-01'
  AND DD_OrderDate <= '2018-08-17'

When constant is used in parameter, then Optimiser create special plan for this query.so if same query is executed with same value then plan is reuse, if value is change then another plan is created.

So Parameter with constant value is fast.

SELECT *
FROM [DIME_WH].[dbo].[FactOrderLines2] FL (nolock)
WHERE DD_OrderDate >= @StartDate
  AND DD_OrderDate <= @EndDate

When variable is use in parameter.Then Optimizer create Execution plan for the First parameter value that was passed .

For Example @StartDate='2018-08-01' and @EndDate='2018-08-07' value were pass for first time. Then optimal execution plan is created by optimiser. This plan is good enough for this value. Next Time @StartDate='2018-08-01' and @EndDate='2018-08-31' value is pass then same previous plan is use which may not be optimal for this parameter.

In another word same plan which was Optimal for first value is Sub optimal for another value.

so query may perform poor and slow.This is known as Parameter sniffing.

There are several ways to overcome this problem.

Parameter Sniffing

Note : In this thread we are only focussing on why variable performance is slow while other factor remaining constant.

edited Aug 20 '18 at 08:59

answered Aug 20 '18 at 08:44

KumarHarsh

5,046
1
18
22

First of all - there are no optimal execution plans, it's always a "Give me good enough" withing a time frame. Second - are you saying that the problem is sniffing in a ad hoc query? – Ruslan Tolkachev Aug 20 '18 at 13:42
the article you provided, you should probably read it all the way! – Ruslan Tolkachev Aug 20 '18 at 13:47
Thanks Khumar, I learnt something I should have known! Or probably forgotten over the years. – TomC Aug 20 '18 at 23:50
@RuslanTolkachev,Yes problem is indeed parameter sniffing. The article provided is ok.I will update my answer . – KumarHarsh Aug 21 '18 at 03:51
@KumarHarsh sql server does not sniff parameters in an ad hoc queries - only stored procedures and sp_executesql. This is from your article "Parameter sniffing is the process whereby SQL Server creates an optimal plan for a stored procedure by using the calling parameters that are passed the first time a stored procedure is executed. By “first time”, I really mean whenever SQL Server is forced to compile or recompile a stored procedures because it is not in the procedure cache." – Ruslan Tolkachev Aug 21 '18 at 04:17
So even though your comment solves some theoretical issues, it does not solve for this particular one – Ruslan Tolkachev Aug 21 '18 at 04:19
@RuslanTolkachev.Thanks for your comments. I think everyone here has understood that it is stored procedures and not ad hoc queries .Above was just part of script.also please read my note :"Other factor remaining constant...". Gimme time,i will update my answer and ping you – KumarHarsh Aug 21 '18 at 04:24
@KumarHarsh then your answer is irrelevant because it does not fit the scenario. Allwin said that there is a query and not a stored procedure. We can always chose an answer that we know and fit it in, it's just not very helpful. We are trying to answer particular question in this particular set up and not some scenarios that we create for an easy answer. – Ruslan Tolkachev Aug 21 '18 at 04:40
@KumarHarsh Thanks muchly for the article, I have understood better about the issue and able to solve it by just change the WHERE statement for the date range using BETWEEN. Run time reduced from 6 minutes to 12 seconds. – AlvinWi Aug 21 '18 at 06:38

Ruslan Tolkachev · Answer 2 · 2018-08-21T04:10:27.667

1

This is because SQL Server does not know the values of your variables at optimization time - when it makes an estimate and can not look up any statistics for it (as a one possibility), so it's (most likely) just scans the whole table instead of make a lookup (seek).
They can be "sniffed" if used inside of stored procedure or parameterized with sp_executesql

edited Aug 21 '18 at 04:10

answered Aug 20 '18 at 04:33

Ruslan Tolkachev

634
4
9

2

This implies that the use of parameterised queries don't use indexes, which is unlikely. We definitely do not usually get table scans for most queries. Think there is something else going on here. – TomC Aug 20 '18 at 05:14
@TomC Yes , but if your index is not covering one and SQL Server thinks that number of rows is going to be greater of about 5000 and it need to make whole lot of Key LookUPs to the Clustered index, it just going to scan the whole table. Most likely it disregard the index because of the wrong estimate – Ruslan Tolkachev Aug 20 '18 at 05:19
Interesting. Quick test shows that both have the same execution plan when I create an index on the data (I used a quick table containing 50,000 dates), but the hard coded version scans a lot less rows (8) than the variable version (7000). – TomC Aug 20 '18 at 05:52
@TomC OK, there are a lot of other things in place such as server settings (optimise for ad hoc work load) and such. It is definitely because optimiser does not know the value of the variables at the run time, the way it's going to be wrong - depends on a lots of things, version of SQL Server as well. – Ruslan Tolkachev Aug 20 '18 at 06:12
@TomC ,very good question ask. Optimiser may use Index or may not use index there are several other factor beside Parameter sniffing. Index has to be fine tuned.We should definitely go for parameterised queries .In this thread we only discuss why variable performance is slow other factor remaining constant. – KumarHarsh Aug 20 '18 at 08:56
@TomC , no it do not implies that the use of parameterised queries don't use indexes.I just appreciated your doubts. – KumarHarsh Aug 20 '18 at 12:00
@RuslanTolkachev,with regard and no offence taken. Your first line is absolutely wrong. – KumarHarsh Aug 21 '18 at 03:55
@KumarHarsh yes - miss typed, of cause it knows it at runtime. I meant during optimization process. – Ruslan Tolkachev Aug 21 '18 at 04:08

score 0 · Answer 3 · answered Aug 20 '18 at 18:58

The problem could be parameter sniffing, maybe not. I'll skip that topic since @KumarHarsh already covered it. The most important question here is: What data type is FactOrderLines.DD_OrderDate? This is important for performance reasons as well as correctness.

First for performance. If DD_OrderDate is a DATE datatype and your variables or parameters are DATETIME then the optimizer has to jump through extra hoops to utilize your index or will be forced to do a scan instead of a seek. Note the following sample data:

USE tempdb;
GO

IF OBJECT_ID('#FactOrderLines') IS NOT NULL DROP TABLE #FactOrderLines;
GO
CREATE TABLE #FactOrderLines(someId INT IDENTITY, DD_OrderDate DATETIME NOT NULL);

CREATE CLUSTERED INDEX nc_factOrderLines ON #FactOrderLines(DD_OrderDate);

INSERT #FactOrderLines(DD_OrderDate) 
SELECT TOP (10000) DATEADD(DAY,CHECKSUM(NEWID())%100, getdate())
FROM sys.all_columns;
GO

Now let's compare the execution plans for the following queries:

-- AS DATE
DECLARE @StartDate DATE = '2018-08-01',
        @EndDate   DATE = '2018-08-20';

SELECT * 
FROM #FactOrderLines
WHERE DD_OrderDate >= @StartDate
AND   DD_OrderDate <= @EndDate
OPTION (RECOMPILE)
GO

-- AS DATETIME
DECLARE @StartDate DATETIME = '2018-08-01',
        @EndDate   DATETIME = '2018-08-31';

SELECT * 
FROM #FactOrderLines
WHERE DD_OrderDate >= @StartDate
AND   DD_OrderDate <= @EndDate
OPTION (RECOMPILE);

Execution plans:

For this reason - you want to make sure that you're using the same datatype for your variables/parameters as for the column they are working against.

Now about correctness; note this query:

DECLARE @StartDate DATE = '2018-08-01',
        @EndDate   DATE = '2018-08-20';

SELECT
  [getdate as datetime]  = GETDATE(),
  [@enddate as datetime] = CAST(@EndDate AS DATETIME),
  [getdate as date]      = CAST(GETDATE() AS DATE),
  [datetime equality]    = IIF(GETDATE() > @EndDate,'yep','nope'),
  [date equality]        = IIF(CAST(GETDATE() AS DATE) > @EndDate,'yep','nope');

Results:

getdate as datetime     @enddate as datetime    getdate as date datetime equality date equality
----------------------- ----------------------- --------------- ----------------- -------------
2018-08-20 13:52:46.247 2018-08-20 00:00:00.000 2018-08-20      yep               nope

Values of the date format translate into datetime as 0 hour, 0 second...

I agree that you absolutely have to use same datatypes as in your columns and cast them when you need to but as you can see in your execution play snap shot they are making same operation (seek and seek with 100% cost on the clustered index. And please note, that he has implicit conversion going on with hard coded values as well. — Ruslan Tolkachev, Aug 20 '18 at 20:21

Variables make query performance worse

3 Answers3

Linked