I'm optimizing some SQL queries (this could be considered part 2 of a question i recently posted) and replacing some NOT IN with NOT EXISTS predicates
Am I right in thinking that the main benefit to doing so is that with NOT EXISTS you get the benefit that the statement will terminate when a single match is found, but NOT IN with a counting subquery would would have to do a full table scan?
It also seems that NOT IN would also require extra work if the data selected contained NULLs, is this correct?
I need to make sure that the second statement better than the first (and functionally equivalent) in these two cases before I implement them in the proc:
Case 1:
--exclude sessions that were tracked as part of a conversion during the last response_time minutes
-- AND session_id NOT IN (SELECT DISTINCT tracked_session_id
-- FROM data.conversions WITH (NOLOCK)
-- WHERE client_id = @client_id
-- AND utc_date_completed >= DATEADD(minute, (-2) * cy.response_time, @date)
-- AND utc_date_completed <= @date
-- AND utc_date_clicked <= @date)
AND NOT EXISTS (SELECT 1
FROM data.conversions WITH (NOLOCK)
WHERE client_id = @client_id
AND utc_date_completed >= DATEADD(minute, (-2) * cy.response_time, @date)
AND utc_date_completed <= @date
AND utc_date_clicked <= @date
AND data.conversions.tracked_session_id = d.session_id
)
Case 2:
-- NOT EXISTS vs full table scan with COUNT(dashboard_id)
-- AND (SELECT COUNT(dashboard_id)
-- FROM data.dashboard_responses WITH(NOLOCK)
-- WHERE session_id = d.session_id
-- AND cycle_id = cy.id
-- AND client_id = @client_id) = 0
AND NOT EXISTS(SELECT 1
FROM data.dashboard_responses
WHERE session_id = d.session_id
AND cycle_id = cy.id
AND client_id = @client_id)
Cheers