Let me say first, this is not a request for efficiency or process change and is merely academic. I just am looking for an explanation of something I didn't expect
I have a very simple query to drive an internal report. When it comes down to it, there is a set of data loaded into a temp table that we'll call '#Sample'.
The business has asked for an exclusion to be added for a certain 'Type' of widget in this temp table. We'll call the field containing these 'Types', [exclusions].
A little more information:
- #Sample will have no more than 1200 records
- [exclusions] are varchar(2)
- There are exactly two [exclusions]
- There is a 1:1 relationship between #Sample and [Table with exclusion field]
So, basically:
Select
S.*
FROM #Sample
INNER JOIN [Table with exclusion field]
on [generic unique id]
and [exclusion] not in ('AA','AB')
The original query (basically Select * From #Sample ran in approximately 1.5 sec. The query making the original exclusions ran in about the same.
Then, in typical fashion, they would like to see a list of all records that will be excluded based on their provided types.
'Easy' I think to myself at 4pm on a Friday. The only change made was removing the 'Not' in the final join.
Select
S.*
FROM #Sample
INNER JOIN [Table with exclusion field]
on [generic unique id]
and [exclusion] in ('AA','AB')
However, when I went to produce the list of the records to be excluded, I cancelled out the query after 120 seconds, feeling this to be too long.
No worries. I went down another path conceptually and produced the list requested; However, I'm most interested in the performance difference between 'IN' and 'NOT IN'.
Ultimately I returned the [exclusion] field in the results, exported and sorted to produce the details at roughly 1.5 seconds of time to execute.
More precisely, WHY IS THERE ANY DIFFERENCE?
Thank you in advance.