1

Let me say first, this is not a request for efficiency or process change and is merely academic. I just am looking for an explanation of something I didn't expect

I have a very simple query to drive an internal report. When it comes down to it, there is a set of data loaded into a temp table that we'll call '#Sample'.

The business has asked for an exclusion to be added for a certain 'Type' of widget in this temp table. We'll call the field containing these 'Types', [exclusions].

A little more information:

  • #Sample will have no more than 1200 records
  • [exclusions] are varchar(2)
  • There are exactly two [exclusions]
  • There is a 1:1 relationship between #Sample and [Table with exclusion field]

So, basically:

Select
S.*
FROM #Sample
INNER JOIN [Table with exclusion field]
   on [generic unique id]
   and [exclusion] not in ('AA','AB')

The original query (basically Select * From #Sample ran in approximately 1.5 sec. The query making the original exclusions ran in about the same.

Then, in typical fashion, they would like to see a list of all records that will be excluded based on their provided types.

'Easy' I think to myself at 4pm on a Friday. The only change made was removing the 'Not' in the final join.

Select
S.*
FROM #Sample
INNER JOIN [Table with exclusion field]
   on [generic unique id]
   and [exclusion] in ('AA','AB')

However, when I went to produce the list of the records to be excluded, I cancelled out the query after 120 seconds, feeling this to be too long.

No worries. I went down another path conceptually and produced the list requested; However, I'm most interested in the performance difference between 'IN' and 'NOT IN'.

Ultimately I returned the [exclusion] field in the results, exported and sorted to produce the details at roughly 1.5 seconds of time to execute.

More precisely, WHY IS THERE ANY DIFFERENCE?

Thank you in advance.

BrownRedHawk
  • 113
  • 6
  • What was the difference in the execution plans? – Tab Alleman Mar 06 '15 at 21:30
  • Because I was not willing to commit an unknown number of minutes to the query, I was only able to compare the 'Estimated Execution Plan' and they were identical. – BrownRedHawk Mar 06 '15 at 21:32
  • This link: http://stackoverflow.com/questions/2065329/sql-server-in-vs-exists-performance has pretty good explanations. It may not directly answer your question though. – Phoenix Mar 06 '15 at 21:38
  • DB engine based on statistics will attempt to utilize the most optimal method of returning results. It's possible it believes it would be faster to do the join before running the excluding criteria in one case, and in the other, run the excluding criteria and then the join... Is there a 1-1 relationship between uniqueID in #sample and exclusion table or is there a 1-Many? – xQbert Mar 06 '15 at 21:38
  • I updated the question to answer these questions. Thanks xQbert – BrownRedHawk Mar 06 '15 at 21:45
  • zerkms - The 'NOT IN' version was the FAST one. It was 'IN' that took forever. – BrownRedHawk Mar 06 '15 at 21:51
  • Not answerable at the moment. Could have been something external like a lock from a different transaction. – Martin Smith Mar 06 '15 at 21:53
  • Do you by chance have any filtered indexes? SQL server will only use filtered indexes if the query has matching filters. Throwing that out as a long shot. – jtimperley Mar 06 '15 at 21:54
  • I'll check on the filtered indexes. Also, I doubt it was external locks, as I tried it at multiple different times. Additionally, these tables are only modified or updated overnight. If I was querying at 12:55am then it might have been an issue. – BrownRedHawk Mar 06 '15 at 21:56

1 Answers1

0

SQL Server works best when it operate on sets not row by agonizing row (RBAR). Usually when there is a 'not' in a query it results in RBAR instead of SARGABLE queries. If a 'not' can be avoided it should be and will result in much faster results.

benjamin moskovits
  • 5,261
  • 1
  • 12
  • 22
  • The "not in" was the faster one. This is counter to your suggestion, as accurate as it may be. – BrownRedHawk Mar 06 '15 at 22:05
  • 1
    A good way of comparing speeds is to use SET STATISTICS IO ON and set statistics time on before you run each query to see how long the query takes. You may want to learn how to read a proposed and actual query plan. I suspect that the reason the not in seemed to run faster was that the select in version returned so many rows that the client took a long time to receive all the rows. If you want to see how fast each one runs (compared to each other) run a select count(*) which would return a single row. – benjamin moskovits Mar 07 '15 at 00:05