4

-------------------- this takes 4 secs to execute (with 2000 000 rows) WHY?---------------------

DECLARE @AccountId INT 
DECLARE @Max INT 
DECLARE @MailingListId INT 

SET @AccountId = 6730
SET @Max = 2000
SET @MailingListId = 82924

SELECT TOP (@Max) anp_Subscriber.Id , Name, Email 
FROM anp_Subscription WITH(NOLOCK) 
  INNER JOIN anp_Subscriber WITH(NOLOCK) 
    ON anp_Subscriber.Id = anp_Subscription.SubscriberId
WHERE [MailingListId] = @MailingListId 
  AND Name LIKE '%joe%' 
  AND [AccountID] = @AccountId

--------------------- this takes < 1 sec to execute (with 2000 000 rows) -----------------------

SELECT TOP 2000 anp_Subscriber.Id ,Name, Email 
FROM anp_Subscription WITH(NOLOCK) 
  INNER JOIN anp_Subscriber WITH(NOLOCK)
    ON anp_Subscriber.Id = anp_Subscription.SubscriberId
WHERE [MailingListId] = 82924 
  AND Name LIKE '%joe%' 
  AND [AccountID] = 6730

Why the difference in excecution time? I want to use the query at the top. Can I do anything to optimize it?

Thanks in advance! /Christian

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
Christian V
  • 105
  • 10
  • I think this is related: http://stackoverflow.com/questions/414336/why-does-the-sqlserver-optimizer-get-so-confused-with-parameters – ypercubeᵀᴹ Oct 22 '11 at 15:55
  • And the MSDN article: http://technet.microsoft.com/en-gb/library/cc966425.aspx#E6TAE – ypercubeᵀᴹ Oct 22 '11 at 15:56
  • In the first case, SQL Server must optimize the query with the parameters for any possible value of those parameters. In the second case, with the actual literal values, the query optimizer can choose a more appropriate query plan - since it needs to be perfect for just those particular values. – marc_s Oct 22 '11 at 15:57
  • Could i specify the parameters more specifically in some way maybe? – Christian V Oct 22 '11 at 16:10
  • 1
    Have you tried adding OPTION (OPTIMIZE FOR (@Max = 2000)) – GilM Oct 22 '11 at 16:30
  • [SQL Server Query Optimization Team blog](http://blogs.msdn.com/b/queryoptteam/archive/2006/03/31/565991.aspx) – Bogdan Sahlean Oct 23 '11 at 16:19

3 Answers3

6

Add OPTION (RECOMPILE) to the end of the query.

SQL Server doesn't "sniff" the values of the variables so you will be getting a plan based on guessed statistics rather than one tailored for the actual variable values.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Thank you. This helped to improve the performance. But another question: Is it demanding on the server to do a recompile every time the query is excecuted? It will probably be used pretty much... – Christian V Oct 23 '11 at 18:32
  • @user1008621 - Does seem a bit of a waste of time if you are going to be constantly recompiling with the same values you could try `OPTION (OPTIMIZE FOR` as per GilM's suggestion. – Martin Smith Oct 23 '11 at 19:22
  • I also thought so, but the only value I can optimize for in this query is @Max, and for some reason the excecution time is not improved for this... Any idea of why? – Christian V Oct 24 '11 at 06:49
  • @ChristianV - Not sure what you mean that's the only value you can optimize for. The syntax `OPTION (OPTIMIZE FOR (@AccountId = 6730, @Max = 2000, @MailingListId = 82924))` should work? You could also look at the difference between the plans and use explicit query hints to get the plan you want (e.g. join type or access method) – Martin Smith Oct 24 '11 at 08:18
  • Oh, I just ment that I have no idea of which accountId or mailingListId that is going to be used... So I don't know if it's the right way to optimize the query for those specific values? What do you mean by access method by the way? – Christian V Oct 24 '11 at 10:08
  • @ChristianV - By access method I meant for example you could use a hint to force a specific index. If you don't know what the values of the variables will be and you get different plans depending on the actual variable values then you may well be better off just using `OPTION (RECOMPILE)`. You'd need to try and figure out cost of compilation vs cost of using an inappropriate plan * probability of using an inappropriate plan to see whether the compilation cost is worthwhile. – Martin Smith Oct 24 '11 at 10:12
0

One possible item to check is whether the MailingListId and AccountId fields in the tables are of type INT. If, for example, the types are BIGINT, the query optimizer will often not use the index on those fields. When you explicitly define the values instead of using variables, the values are implicitly converted to the proper type.

Make sure the types match.

Brian Knight
  • 4,970
  • 28
  • 34
0

The second query has to process ONLY 2000 records. Point.

The first has to process ALL records to find the maximum.

Top 2000 does not get you the highest 2000, it gets you the first 2000 of the result set - in any order.

if yo uwant to change them to be identical, the second should read

TOP 1

and then order by anp_Subscriber.Id descending (plus fast first option).

TomTom
  • 61,059
  • 10
  • 88
  • 148