66

How do you rewrite expressions containing the standard IS DISTINCT FROM and IS NOT DISTINCT FROM operators in the SQL implementation in Microsoft SQL Server 2008R2 that does not support them?

binki
  • 7,754
  • 5
  • 64
  • 110
Jason Kresowaty
  • 16,105
  • 9
  • 57
  • 84

10 Answers10

63

The IS DISTINCT FROM predicate was introduced as feature T151 of SQL:1999, and its readable negation, IS NOT DISTINCT FROM, was added as feature T152 of SQL:2003. The purpose of these predicates is to guarantee that the result of comparing two values is either True or False, never Unknown.

These predicates work with any comparable type (including rows, arrays and multisets) making it rather complicated to emulate them exactly. However, SQL Server doesn't support most of these types, so we can get pretty far by checking for null arguments/operands:

  • a IS DISTINCT FROM b can be rewritten as:

    ((a <> b OR a IS NULL OR b IS NULL) AND NOT (a IS NULL AND b IS NULL))
    
  • a IS NOT DISTINCT FROM b can be rewritten as:

    (NOT (a <> b OR a IS NULL OR b IS NULL) OR (a IS NULL AND b IS NULL))
    

Your own answer is incorrect as it fails to consider that FALSE OR NULL evaluates to Unknown. For example, NULL IS DISTINCT FROM NULL should evaluate to False. Similarly, 1 IS NOT DISTINCT FROM NULL should evaluate to False. In both cases, your expressions yield Unknown.

Chris Bandy
  • 1,498
  • 13
  • 7
  • 1
    ((a <> b OR a IS NULL OR b IS NULL) AND NOT (a IS NULL AND b IS NULL)): cant we just a<>b or a is null xor b is null – Rami Jamleh Jun 16 '15 at 07:56
  • Why can't we rewrite `a IS NOT DISTINCT FROM b` as `a = b OR a IS NULL and b IS NULL`? Seems much more concise that way. – Rudey Jun 20 '19 at 07:13
  • 2
    @Rudey because when only one operand is null, `a = b` evaluates to null which then causes the entire expression to evaluate to null. – Chris Bandy Jun 21 '19 at 13:11
  • Is `IS DISTINCT FROM` implemented in sql server 2019? It is not implemented in sql server 2017. – boggy Mar 31 '20 at 23:16
  • 2
    @costa No, SQL Server 2019 still does not support `IS DISTINCT FROM` - nor does it have **any** way to succinctly perform NULL-safe comparisons (other than using set-operations in predicates, which is impractical for scalar comparisons). – Dai Jun 05 '20 at 06:35
  • Can you provide an example of when the distinction between *False* and *Unknown* matters? `OR` is not infected by *Unknown* and you even take advantage of that in your expression. For now I am continuing to use `(a IS NULL AND b IS NULL OR a = b)` for `a IS NOT DISTINCT FROM b`. – binki Apr 04 '22 at 17:21
  • 1
    @binki Aside from the obvious `NOT` (negation), I cannot in SQL Server. The lack of T031 boolean type makes it difficult to use boolean expressions. – Chris Bandy Apr 09 '22 at 01:15
  • @ChrisBandy Thanks, `NOT` is the one example that I have encountered before but could not remember. Being able to `NOT` the expression blindly does matter to me, so I might need to use this pattern or one of the set-based options in the future. – binki Apr 09 '22 at 03:23
47

Another solution I like leverages the true two-value boolean result of EXISTS combined with INTERSECT. This solution should work in SQL Server 2005+.

  • a IS NOT DISTINCT FROM b can be written as:

    EXISTS(SELECT a INTERSECT SELECT b)

As documented, INTERSECT treats two NULL values as equal, so if both are NULL, then INTERSECT results in a single row, thus EXISTS yields true.

  • a IS DISTINCT FROM b can be written as:

    NOT EXISTS(SELECT a INTERSECT SELECT b)

This approach is much more concise if you have multiple nullable columns you need to compare in two tables. For example, to return rows in TableB that have different values for Col1, Col2, or Col3 than TableA, the following can be used:

SELECT *
FROM TableA A
   INNER JOIN TableB B ON A.PK = B.PK
WHERE NOT EXISTS(
   SELECT A.Col1, A.Col2, A.Col3
   INTERSECT
   SELECT B.Col1, B.Col2, B.Col3);

Paul White explains this workaround in more detail: https://sql.kiwi/2011/06/undocumented-query-plans-equality-comparisons.html

Community
  • 1
  • 1
John Keller
  • 596
  • 5
  • 4
  • 6
    This should be the accepted answer, as it rewrites the predicate in a way that doesn't duplicate references to `a` and `b`. For nondeterministic expressions `a` and `b`, or expressions with side-effects (such as logging), that would be very useful. Your second example also emulates `(A.Col1, A.Col2, A.Col3) IS DISTINCT FROM (B.Col1, B.Col2, B.Col3)`, which is only supported natively by PostgreSQL (to my knowledge). A very useful predicate, at times. – Lukas Eder Aug 08 '14 at 19:39
  • Thumbs up for `exists(...intersect...)` idea. Useful when a and b are long expressions. – Tomáš Záluský Jun 13 '19 at 15:24
  • I modified some of my queries that had predicates similar to other answers here. I concluded that using `INTERSECT` results in a much faster query. Thanks for sharing! – Rudey Jan 13 '20 at 16:33
13

If your SQL implementation does not implement the SQL standard IS DISTINCT FROM and IS NOT DISTINCT FROM operators, you can rewrite expressions containing them using the following equivalencies:

In general:

a IS DISTINCT FROM b <==>
(
    ((a) IS NULL AND (b) IS NOT NULL)
OR
    ((a) IS NOT NULL AND (b) IS NULL)
OR
    ((a) <> (b))
)

a IS NOT DISTINCT FROM b <==>
(
    ((a) IS NULL AND (b) IS NULL)
OR
    ((a) = (b))
)

This answer is incorrect when used in a context where the difference between UNKNOWN and FALSE matters. I think that is uncommon, though. See the accepted answer by @ChrisBandy.

If a placeholder value can be identified that does not actually occur in the data, then COALESCE is an alternative:

a IS DISTINCT FROM b <==> COALESCE(a, placeholder) <> COALESCE(b, placeholder)
a IS NOT DISTINCT FROM b <==> COALESCE(a, placeholder) = COALESCE(b, placeholder)
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
Jason Kresowaty
  • 16,105
  • 9
  • 57
  • 84
  • 8
    This is a wrong answer though. See the last paragraph in Chris' answer. – ypercubeᵀᴹ Oct 22 '13 at 22:34
  • 1
    Yes, this answer is incorrect when used in a context where the difference between UNKNOWN and FALSE matters. I think that is uncommon, though. – Jason Kresowaty Oct 24 '13 at 22:42
  • 6
    @JasonKresowaty: This isn't uncommon at all. In any predicate like `(a IS DISTINCT FROM b) AND something`, the distinction between `UNKNOWN` and `FALSE` is essential. If `a` and `b` are both `NULL`, then your emulation will generate `NULL`, regardless if `something` is `TRUE` or `FALSE`. – Lukas Eder Aug 09 '14 at 07:54
  • You can also use `coalesce(a = b, a is null and b is null)` to test if they're the same, thus (a IS NOT DISTINCT FROM b) – bart Jun 28 '17 at 08:06
  • For some reason the first approach in this answer (using logic) is an order of magnitude slower than the second (using functions) for me (SQL Server). – Denziloe Feb 27 '19 at 16:06
9

Just to extend John Keller's answer. I prefer to use EXISTS and EXCEPT pattern:

a IS DISTINCT FROM b
<=>
EXISTS (SELECT a EXCEPT SELECT b)
-- NOT EXISTS (SELECT a INTERSECT SELECT b)

and

a IS NOT DISTINCT FROM  b
<=>
NOT EXISTS (SELECT a EXCEPT SELECT b)
-- EXISTS (SELECT a INTERSECT SELECT b)

for one particular reason. NOT is aligned whereas with INTERSECT it is inverted.


SELECT 1 AS PK, 21 AS c, NULL  AS  b
INTO tab1;

SELECT 1 AS PK, 21 AS c, 2 AS b
INTO tab2;

SELECT *
FROM tab1 A
JOIN tab2 B ON A.PK = B.PK
WHERE EXISTS(SELECT A.c, A.B
              EXCEPT
              SELECT B.c, B.b);

DBFiddle Demo

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
8

One caveat in rewriting IS DISTINCT FROM and IS NOT DISTINCT FROM would be to not interfere with using indexes, at least when using SQL Server. In other words, when using the following:

WHERE COALESCE(@input, x) = COALESCE(column, x)

SQL Server won't be able to use any index that includes column. So in a WHERE clause, it would be preferable to use the form

WHERE @input = column OR (@input IS NULL AND column IS NULL)

to take advantage of any indexes for column. (Parens only used for clarity)

Boyd
  • 79
  • 1
  • 2
  • 2
    +1 for mentioning for how functions kill the use of the index. It's how I ended up here in the first place. – Serge Mar 30 '16 at 16:11
3

We are happy to announce that IS [NOT] DISTINCT FROM is now supported as of SQL Server 2022 CTP 2.1 (and the cloud versions as well). So, hopefully the workarounds are no longer needed generally, though they will still work. Documentation Page Link

Conor Cunningham MSFT
  • 4,151
  • 1
  • 15
  • 21
1

Spelling it out using CASE

For the reference, the most canonical (and readable) implementation of IS [ NOT ] DISTINCT FROM would be a well-formatted CASE expression. For IS DISTINCT FROM:

CASE WHEN [a] IS     NULL AND [b] IS     NULL THEN 0 -- FALSE
     WHEN [a] IS     NULL AND [b] IS NOT NULL THEN 1 -- TRUE
     WHEN [a] IS NOT NULL AND [b] IS     NULL THEN 1 -- TRUE
     WHEN [a] =               [b]             THEN 0 -- FALSE
     ELSE                                          1 -- TRUE
END

Obviously, other solutions (specifically John Keller's, using INTERSECT) are more concise.

More details here.

Using DECODE if available

I know this question is about SQL Server, but for completeness' sake, Db2 and Oracle support a DECODE() function, in case of which the following can be emulated:

-- a IS DISTINCT FROM b
DECODE(a, b, 1, 0) = 0

-- a IS NOT DISTINCT FROM b
DECODE(a, b, 1, 0) = 1
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
  • `FALSE` and `TRUE` are not constants in SQL Server. You pulled an example for MySQL. – binki Apr 04 '22 at 17:17
  • @binki: Yes you're right. But the question was *also* about ansi-sql (see tag), and generally database products *"such as SQL Server"* – Lukas Eder Apr 05 '22 at 06:10
  • I think the ansi-sql tag is just there to refer to the fact that the ANSI standard defines `IS DISTINCT FROM`. The tag sql is probably there just because that is the broader topic. Three of the tags are about SQL Server: tsql, sql-server, and sql-server-2008-r2. The question itself says “such as”, unfortunately, but it also specifies an exact SQL Server version and that is how everyone else interpreted it. – binki Apr 05 '22 at 14:39
  • @binki: Eh, I just tried to provide something useful, you know. Whatever, here's the updated answer. – Lukas Eder Apr 05 '22 at 15:10
0

These expressions can be a good substitute for the IS DISTINCT FROM logic and perform better than the previous examples because they end up being compiled by SQL server into a single predicate expression which will result in approx. half the operator cost on a filter expression. They are essentially the same as the solutions as provided by Chris Bandy, however they use nested ISNULL and NULLIF functions to perform the underlying comparisons.

(... obviously ISNULL could be substituted with COALESCE if you prefer)

  • a IS DISTINCT FROM b can be rewritten as:

    ISNULL(NULLIF(a, b), NULLIF(b, a)) IS NOT NULL

  • a IS NOT DISTINCT FROM b can be rewritten as:

    ISNULL(NULLIF(a, b), NULLIF(b, a)) IS NULL

Jason
  • 711
  • 6
  • 14
0
a IS NOT DISTINCT FROM b

can be rewritten as:

(a IS NOT NULL AND b IS NOT NULL AND a=b) OR (a IS NULL AND b is NULL)

a IS DISTINCT FROM b

can be rewritten as:

NOT (a IS NOT DISTINCT FROM b)
Pang
  • 9,564
  • 146
  • 81
  • 122
wojtek
  • 17
  • 1
0

This is an old question and there is a new answer. It is easier to understand and maintain.

-- a IS DISTINCT FROM b
CASE WHEN (a = b) OR (a IS NULL AND b IS NULL) THEN 1 ELSE 0 END = 0

-- a IS NOT DISTINCT FROM b
CASE WHEN (a = b) OR (a IS NULL AND b IS NULL) THEN 1 ELSE 0 END = 1

It should be noted that this syntax alternative to IS [NOT] DISTINCT FROM works in all major SQL databases (see link at the end). This and the alternatives are elaborately explained here

oᴉɹǝɥɔ
  • 1,796
  • 1
  • 18
  • 31