Variable vs string literal speed disparity

Question

I have a simple query that runs very quickly (about 1 second) when I use a string literal in my WHERE clause, such as:

select *
from table
where theDate >= '6/5/2016'

However, this query (which is not materially different) runs in over 2 minutes:

declare @thisDate as date
set @thisDate = '6/5/2016'

select *
from table
where theDate >= @thisDate

I tried using OPTION(RECOMPILE) as suggested here but it does nothing to improve the performance. the column theDate is a datetime, this database was recently migrated from SQL Server 2005.

There is no index on my theDate column, and the table has just over 1 billion rows. It's a shared resource and I don't want to start indexing without some assurance that it will help.

I find that using logic instead of a variable provides the same performance as a string literal:

select *
from table
where theDate >= dateadd(dd, -23, getdate())

But, if I replace the date integer with a variable integer the performance is again hindered.

How can I include a variable and maintain performance?

EDIT

Actual query included by request:

DECLARE @days INT

Set @days = 7

select c.DEBT_KEY
       , c.new_value
       , c.CHANGE_DATE
from changes c with (nolock)
where c.C_CODE = 3
and c.old_value = 4
and c.CHANGE_DATE >= dateadd(dd, -@days, getdate())

No joins.

Query Plans

With Variable (xml explain plan):

With string literal (xml explain plan):

So I can see the difference is that the variable invokes a Clustered Index Scan (clustered) while the string literal invokes a Key Lookup (clustered)... I will need to refer to google because I don't know really anything about the performance pros/cons of these.

EDIT EDIT

This worked (xml explain plan):

DECLARE @days INT

Set @days = 7

select c.DEBT_KEY
       , c.new_value
       , c.CHANGE_DATE
from changes c with (nolock)
where c.CHANGE_CODE = 3
and c.old_value = 4
and c.CHANGE_DATE >= dateadd(dd, -@days, getdate())
OPTION(OPTIMIZE FOR (@days = 7))

... I don't know why. Also I dislike the solution as it negates my purpose of using a variable, which is to put all variables at the top of the proc in order to mitigate the need to poke around in the code during the inevitable maintenance.

over 1 billion rows and no index - wow! Searching this lot for a literal should take quite long as well... I'd assume there was some kind of cached result left... Did you compare the execution plans? * I don't want to start indexing without some assurance that it will help* Well I can assure you that an index will help :-) — Shnugo, Jun 29 '16 at 22:09
Well, there's an index, but not on `theDate`, I guess I could have been clearer on that :P — n8., Jun 29 '16 at 22:12
With so many rows you should use indexes on all columns you want to use in sort, filter or join operations... An index is - easy spoken - a sorted list of the values of this column. To find a given value in this list is an extremely fast process (divide in half, look if bigger or smaller, ah, it's bigger! Divide the upper half and so on...) Once found, all values are sitting together as one block. Now imagine an unsorted heap. Your query'd have to scan the whole table row-by-row. — Shnugo, Jun 29 '16 at 22:19
Sounds reasonable. The table wasn't built for analytics, as I am using it for. It's transactional. So as the purpose changes, so must the database. — n8., Jun 29 '16 at 22:22
I had the same situation and it was because of parameter sniffing. In my case using OPTION(RECOMPILE) fixed the issue. it is weird that you have a different case — FLICKER, Jun 29 '16 at 23:14
The performance of the query with `OPTION(RECOMPILE)` should be the same as with the literal. And it should be slow for 1B rows without an index. Show us the exact query that you run with this option. One more thing to try: use exactly the same types for the `@thisDate` variable and `theDate` column. — Vladimir Baranov, Jun 30 '16 at 00:16
Your first stop should be the actual query plan. https://www.mssqltips.com/sqlservertutorial/2252/estimated-vs-actual-query-plan/. Any other analysis will have caching clouding the issue. So get actual/estimated query plans for fast and slow queries and compare. BTW, the "I don't want to start indexing without some assurance that it will help" attitude is an _excellent_ approach - don't just throw indexes at it unless you understand the problem and have captured some before and after measurements — Nick.Mc, Jun 30 '16 at 00:21
I have a suspicion that it is not full `where` clause provided, otherwise in both cases must be Full Table Scan, which will be pretty long. IMHO `OPTION` should not help, because there is no index on that field. — Slava Murygin, Jun 30 '16 at 01:54
@Nick.McDermaid I did look at the query plans but am essentially illiterate when it comes to them, and I only look at them when queries run unexpectedly slow... btw that link kicks back a 404 error. — n8., Jun 30 '16 at 14:17
@VladimirBaranov "One more thing to try: use exactly the same types for the `@thisDate` variable and `theDate` column" I did that, no joy. — n8., Jun 30 '16 at 14:23
Well, it is still strange why `OPTION(RECOMPILE)` doesn't generate the same plan as with literals. But, usually the first question is: did you update the statistics / rebuilt the indexes? Then, to understand what is going on it is necessary to know two **actual** queries that you run (not simplified version). The definition (DDL) of the table(s) and all indexes. The cardinality: total number of rows in a table, rows with `C_CODE = 3`, with `CHANGE_DATE >= ...`, i.e. how efficient are the filters. How many rows the query returns. And XMLs of both fast and slow actual plans (use pastebin.com) — Vladimir Baranov, Jun 30 '16 at 15:01
@VladimirBaranov ok, explain plans now linked. Thanks for your attention! — n8., Jun 30 '16 at 16:00
@CodeDifferent I found an article on that and used `OPTION(OPTIMIZE FOR (@thisDate = '6/5/2016'))`, this let the query run in 3 seconds instead of 2 minutes. Still not as good as the <1 second string literal, but much better. — n8., Jun 30 '16 at 16:21
@n8., I suspect that statistics is outdated. Optimiser uses wrong statistics and makes wrong assumptions which lead to choosing this or that plan shape (index scan vs index seek). On top of that your tests may be affected by plan caching/parameter sniffing if you don't perform them carefully. A query with literals and `OPTION(RECOMPILE)` should produce the same plan. I don't know how you manage to get different plans. — Vladimir Baranov, Jul 01 '16 at 01:02

score 0 · Answer 1 · answered Jun 30 '16 at 16:06

The fast version does a clustered key lookup (can go right to the part of the table where that value is found).

The slow version does a non-clustered seek and then merges that with a clustered index scan (it's having to scan through the whole table).

I see the @thisDate variable is defined as Date type. Is is possible the column is defined as a DateTime? If that were true, any value in @thisDate wouldn't exactly line up with your clustered index, meaning the database will have to check through the whole table as we see here. The literal, on the other hand, would be interpreted as a DateTime value, which does match your table column type and will work with the index.

If this is right, you can fix things with a very simple change:

declare @thisDate as datetime
set @thisDate = '6/5/2016'

Only 4 characters difference.

You can also try this in conjunction with OPTION(RECOMPILE), and you may want to also take a look at OPTIMIZE FOR UNKNOWN.

This was suggested in the comments, it does not speed up the query. ~8000 rows returned in 2 minutes. — n8., Jun 30 '16 at 16:09

Variable vs string literal speed disparity

1 Answers1