select statment performance degradation when using DISTINCT with parameters

Question

Note for bounty - START:

PARAMETERS SNIFFING (that is the only "idea" that was reported in pre-bounty questions) is not the issue here, as you can read in the "update" section at the end of the question. The problem is really related to how sql server creates execution plans for a parametrized query when distinct is used. I uploaded a very simple database backup (it works with sql server 2008 R2) here (you must wait 20 seconds before downloading). Against this DB you can try to run the following queries:

-- PARAMETRIZED QUERY

declare @IS_ADMINISTRATOR int
declare @User_ID int
set @IS_ADMINISTRATOR = 1 -- 1 for administrator 0 for normal
set @User_ID = 50

SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
  DOC.DOCUMENT_ID
FROM
  DOCUMENTS DOC LEFT OUTER JOIN
  FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
  ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)   
WHERE
  1 = @IS_ADMINISTRATOR OR  ROL.USER_ID = @USER_ID

-- NON PARAMETRIZED QUERY

SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!! 
  DOC.DOCUMENT_ID
FROM
  DOCUMENTS DOC LEFT OUTER JOIN
  FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
  ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)   
WHERE
  1 = 1 OR  ROL.USER_ID = 50

Final note: I noticed DSTINCT is the problem, my goal is to achieve the same speed (or at least almost the same speed) in both queries.

Note for bounty - END:

Original question:

I noticed that there is an heavy difference in performance between

-- Case A
select distinct * from table where id > 1

compared to (this is the sql generated by my Delphi application)

-- Case B1
exec sp_executesql N'select distinct * from table where id > @P1',N'@P1 int',1

that is equivalent to

-- Case B2
declare @P1 int
set @P1 = 1
select distinct * from table where id > @P1

A performs much faster than B1 and B2. The performance becomes the same in case I remove DISTINCT.

May you comment on this?

Here i posted a trivial query, I noticed this on a query with 3 INNER JOIN. Anyway not a complex query.

Note: I was expecting to have THE EXACT SAME PERFORMANCE, in cases A and B1/B2.

So are there some caveats in using DISTINCT?

UPDATE:

I tried to disable parameter sniffing using DBCC TRACEON (4136, -1) (the flag to disable parameter sniffing) but nothing changes. So in this case the problem is NOT LINKED TO PARAMETERS SNIFFING. Any idea?

I've never used "*" with select distinct. When you include the ID (presumably the PK?) in the select-cause, won't every row be distinct, and so asking for distinct is moot? — Tim, Dec 10 '10 at 12:56
no if I include ID (and that of course was my real case) the problem is there. — UnDiUdin, Dec 10 '10 at 13:42
I don't follow. If you include the PK in the select-clause (i.e. when you do select asterisk) there is no need for DISTINCT, right? All rows will be distinct in any case by virtue of the PK column. — Tim, Dec 11 '10 at 04:40

score 2 · Accepted Answer · edited May 23 '17 at 11:55

The problem isn't that DISTINCT is causing a performance degradation with parameters, it's that the rest of the query isn't being optimized away in the parameterized query because the optimizer won't just optimize away all of the joins using 1=@IS_ADMINISTRATOR like it will with just 1=1. It won't optimize the joins away without distinct because it needs to return duplicates based on the result of the joins.

Why? Because the execution plan tossing out all of the joins would be invalid for any value other than @IS_ADMINISTRATOR = 1. It will never generate that plan regardless of whether you are caching plans or not.

This performs as well as the non parameterized query on my 2008 server:

-- PARAMETRIZED QUERY

declare @IS_ADMINISTRATOR int
declare @User_ID int
set @IS_ADMINISTRATOR = 1 -- 1 for administrator 0 for normal
set @User_ID = 50

IF 1 = @IS_ADMINISTRATOR 
BEGIN
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
  DOC.DOCUMENT_ID
FROM
  DOCUMENTS DOC LEFT OUTER JOIN
  FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
  ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)   
WHERE
  1 = 1
END
ELSE 
BEGIN
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
  DOC.DOCUMENT_ID
FROM
  DOCUMENTS DOC LEFT OUTER JOIN
  FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
  ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)   
WHERE
  ROL.USER_ID = @USER_ID
END

What's clear from the query plan I see running your example is that @IS_ADMINISTRATOR = 1 does not get optimized out the same as 1=1. In your non-parameterized example, the JOINS are completely optimized out, and it just returns every id in the DOCUMENTS table (very simple).

There are also different optimizations missing when @IS_ADMINISTRATOR <> 1. For instance, the LEFT OUTER JOINS are automatically changed to INNER JOINs without that OR clause, but they are left as-is with that or clause.

See also this answer: SQL LIKE % FOR INTEGERS for a dynamic SQL alternative.

Of course, this doesn't really explain the performance difference in your original question, since you don't have the OR in there. I assume that was an oversight.

Yes I see. Thanks for the explanation. At least now I am closer to the answer. i see clearly now that the non parametrized query performs better because the non parametrized query with 1=1 is much simpler. Then about why distinct performs worse it is an inner working of sql server, so as you suggest I shuold review the query or of course considering to send non parametrized queries. — UnDiUdin, Dec 20 '10 at 10:02

score 1 · Answer 2 · edited May 23 '17 at 11:47

1

But also see "parameter sniffing" issue.

Why does a parameterized query produces vastly slower query plan vs non-parameterized query

https://groups.google.com/group/microsoft.public.sqlserver.programming/msg/1e4a2438bed08aca?hl=de

edited May 23 '17 at 11:47

Community

1
1

answered Dec 10 '10 at 13:04

Tim

5,371
3
32
41

The 1st link you gave is very similar. Here the problem is that only with distinct I see the problem. – UnDiUdin Dec 10 '10 at 13:50
ANYWAY I tried DBCC TRACEON (4136, -1) and I have no difference. So it is another problem! – UnDiUdin Dec 10 '10 at 14:30

score 1 · Answer 3 · answered Dec 10 '10 at 13:07

1

Have you tried running your second (slower) query without dynamic SQL? Have you cleared the cache and rerun the first query? You may be experiencing parameter sniffing with the parameterized dynamic SQL query.

I think the DISTINCT is a red herring and not the actual issue.

answered Dec 10 '10 at 13:07

JNK

63,321
15
122
138

To properly test SQL, you should **never** clear the cache. You should bring the cache in a state that mimic **real** SQL executions state. A query may perform very differently with a clear cache or not, but when it is execute in a real environment, could you expect a *clear* cache? – Dec 10 '10 at 13:22
No DISTINCT makes the difference, but probably because parameters sniffing causes a "stupid" query plan only if distinct is used. Is there a way to tell SQL Server to create the same execution plan in B and A? – UnDiUdin Dec 10 '10 at 13:44
2

@Idsandon - If you are comparing two queries to see which is more performant, why would you NOT clear the cache. I asked that because if he had run query a before, and it ran quickly when he runs it again, but query B is slow the first time, it could be because of a cached query plan. Apples to Apples! – JNK Dec 10 '10 at 13:48
Moreover I cannot avoid dynamic SQL, since Delphi generates dynamic SQL for me. Of course I could avoid using ParmByName in Delphi and use text Search&Replace, but I hope there is another solution! – UnDiUdin Dec 10 '10 at 13:52

select statment performance degradation when using DISTINCT with parameters

3 Answers3

Linked