0

In the following SQL, @UserExtendedSecurity is a table variable which simply provides a 1 or 0 indicator based on whether or not a user has extended security:

INSERT INTO @UserExtendedSecurity (UserId, UserName, HasExtendedSecurity)
    SELECT
        ue.Id,
        adu.FirstName + ' ' + adu.LastName,
        0
    FROM 
        UserExtension ue
    JOIN 
        ADUser adu ON ue.ADUserId = adu.Id
    WHERE 
        ue.Id NOT IN (SELECT UserId FROM @UserExtendedSecurity)

In the SQL example above, @UserExtendedSecurity already has all users with HasExtendedSecurity = 1, so I'm just selecting the remaining users into the table var with a HasExtendedSecurity value of 0. The NOT IN clause at the end does what it needs to do but is there a more proper/elegant/efficient approach to accomplish what I'm doing? Do SQL experts consider using NOT IN in this way as a red flag?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
user9393635
  • 1,369
  • 4
  • 13
  • 32
  • 3
    I prefer `not exists`, but `not in` is not a "red flag". – Gordon Linoff Mar 26 '18 at 20:41
  • 2
    Use `NOT EXISTS` instead! Too many programmers have been surprised by NOT IN's behavior when it comes to null values from the subquery. – jarlh Mar 26 '18 at 20:42
  • 2
    When `UserId` is defined as `NOT NULL` it's safe to use `NOT IN`. But using it against Nullable columns should be avoided and `NOT EXISTS` used instead. – dnoeth Mar 26 '18 at 20:44
  • cool thanks for the tip. so I can replace "not in" with "not exists" without changing anything else? this would not create any unexpected side effects (and potentially avoid them) and sql experts would generally consider this better form? – user9393635 Mar 26 '18 at 20:44
  • Compare plans. Most likely it does not matter what to use, as the optimizer since 2012 will choose the best plan. – ajeh Mar 26 '18 at 20:46
  • 1
    simply replacing "not in" with "not exists" doesn't work – user9393635 Mar 26 '18 at 20:49
  • `not exists` requires a co-related subquery. `not in` does not –  Mar 26 '18 at 21:06

2 Answers2

0

You will actually get better performance with NOT EXISTS, surprisingly. Just be aware the behavior is slightly different-- it works better with null values (see this question for details).

INSERT INTO @UserExtendedSecurity
(
    UserId,
    UserName,
    HasExtendedSecurity
)
SELECT
ue.Id,
adu.FirstName + ' ' + adu.LastName,
0
FROM 
UserExtension ue
JOIN ADUser adu ON ue.ADUserId = adu.Id
WHERE NOT EXISTS
( SELECT 0 FROM @UserExtendedSecurity WHERE UserId = ue.Id)
John Wu
  • 50,556
  • 8
  • 44
  • 80
0

Basically you have 3 options:
(lt;dr; - All options are valid and all have their correct usage - but In general, I would rather go with not exists most of the times.)

NOT IN

SELECT
    ue.Id,
    adu.FirstName + ' ' + adu.LastName,
    0
FROM 
    UserExtension AS ue
JOIN 
    ADUser AS adu ON ue.ADUserId = adu.Id
WHERE 
    ue.Id NOT IN (SELECT UserId FROM @UserExtendedSecurity)

Pros:

  1. the select statement from the table variable is not related to the main select.
  2. simple, readable syntax.

Cons:

  1. null handling. When you have a null inside aNOT IN, you will not get any results, unless ansi_nulls is set to off (which you really should avoid since it's deprecated), since x NOT IN(y,NULL) is equivalent to x <> y and x <> null - and comparing to any value to null, including another null, will return unknown.
  2. With a large set inside the IN, performance might be slower compared to other options.

NOT EXISTS

SELECT
    ue.Id,
    adu.FirstName + ' ' + adu.LastName,
    0
FROM 
    UserExtension AS ue
JOIN 
    ADUser AS adu ON ue.ADUserId = adu.Id
WHERE 
    NOT EXISTS (SELECT 1 FROM @UserExtendedSecurity WHERE UserId = ue.Id)

Pros:

  1. Null is no longer an issue.
  2. Better performance than the not in alternative.

Cons:

  1. A bit more cumbersome syntax.
  2. Requires a correlated sub query.

LEFT JOIN

SELECT
    ue.Id,
    adu.FirstName + ' ' + adu.LastName,
    0
FROM 
    UserExtension AS ue
JOIN 
    ADUser AS adu ON ue.ADUserId = adu.Id
LEFT JOIN  @UserExtendedSecurity AS ues ON ues.UserId = ue.Id
WHERE ues.UserId IS NULL

Pros:

  1. No sub query is needed, correlated or not.
  2. Null values are "naturally" handled.
  3. Better performance than the not in alternative.

Cons:

  1. Might cause duplicate results.
  2. The other alternatives conveys the intent of the query better, making this alternative less readable.
Zohar Peled
  • 79,642
  • 10
  • 69
  • 121