0

I am not very familiar with SQL and I am having trouble understanding what this query means. I think I understand most of it but I am confused what the IS DISTINCT is doing and why it needs to be include.

UPDATE t1
SET pop_count1=sub.pop_count1
FROM
  (SELECT state,
          count(*) AS pop_count1
   FROM t2
   WHERE stime < 3600
     AND stime + length > 0
   GROUP BY state) sub
WHERE t1.state=sub.state
  AND t1.pop_count1 IS DISTINCT FROM sub.pop_count1;
izz
  • 1
  • 3
  • `t1.pop_count1 IS DISTINCT FROM sub.pop_count1` "is distict from": three words form one operator. The operator is `<>`, but it treats NULLs as if it were values. So: `NULL IS NOT DISTINCT FROM NULL` is true. – wildplasser Jun 09 '20 at 16:09
  • But if I am setting `pop_count1=sub.pop_count1` shouldn't those always be the same – izz Jun 09 '20 at 16:12
  • 1
    The operator is `IS DISTINCT FROM` and it's documented in [the manual](https://www.postgresql.org/docs/current/functions-comparison.html#FUNCTIONS-COMPARISON-PRED-TABLE) –  Jun 09 '20 at 16:23
  • @a_horse_with_no_name I am still confused if it means 'not equal, treating null like an ordinary value' why would the sql check if pop_count1=sub.pop_count1 when the first part of the sql is setting `pop_count1=sub.pop_count1`? Am I missing something? – izz Jun 09 '20 at 16:25
  • It only does the UPDATE if the values are different –  Jun 09 '20 at 16:29
  • 1
    Does this answer your question? [How to rewrite IS DISTINCT FROM and IS NOT DISTINCT FROM?](https://stackoverflow.com/questions/10416789/how-to-rewrite-is-distinct-from-and-is-not-distinct-from) – underscore_d Jun 09 '20 at 16:32
  • @a_horse_with_no_name so if t1.pop_count1 were NULL and sub.pop_count1 were some value, then it would do the update, correct? – izz Jun 09 '20 at 16:32
  • The statement is declining to do the update when the update would be degenerate (updating the value to same thing it already is). It is a performance optimization. – jjanes Jun 09 '20 at 17:11

1 Answers1

0

IS DISTINCT FROM is an operator that is defined like this:

a IS DISTINCT FROM b

is the same as

a <> b OR (a IS NULL OR b IS NULL) AND NOT (a IS NULL AND b IS NULL)

So it is like the inequality operator <>, except that NULL is treated like a normal value unless the new value is different from the old one.

The purpose of the clause in the update statement is to not perform the update unless

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263