NOT EXISTS vs NOT attribute= ANY vs NOT IN

Question

After I read this post, I found a third way to achieve the query with NOT ANY. Does it really makes a difference?

Which of these queries run the faster?

NOT EXISTS

SELECT DISTINCT U.no FROM U
INNER JOIN PUF ON PUF.noU = U.no
WHERE NOT EXISTS (
    SELECT DISTINCT PUF.noU FROM F
    INNER JOIN PUF AS PUF2 ON PUF2.noF = F.no
    WHERE VILLE <> 'Lausanne' AND PUF.noU = PUF2.noU
)

NOT attribute = ANY

SELECT DISTINCT U.no FROM U
INNER JOIN PUF ON PUF.noU = U.no
WHERE NOT PUF.noU = ANY (
    SELECT DISTINCT PUF.noU FROM F
    INNER JOIN PUF ON PUF.noF = F.no
    WHERE VILLE <> 'Lausanne'
)

NOT IN (The initial one)

SELECT DISTINCT U.no FROM U
INNER JOIN PUF ON PUF.noU = U.no
WHERE PUF.noU NOT IN (
    SELECT DISTINCT PUF.noU FROM F
    INNER JOIN PUF ON PUF.noF = F.no
    WHERE VILLE <> 'Lausanne'
)

If they're equivalent, should I use NOT EXISTS instead of NOT attribute = ANY () ?

Using `exists` is almost always the best choice in my experience - so in other words, none of your examples. — Stu, Feb 06 '22 at 20:47
The linked post is about SQL Server. While some basic principals are the same, syntax details and performance characteristics differ in Postgres. — Erwin Brandstetter, Feb 07 '22 at 01:22

Erwin Brandstetter · Answer 1 · 2022-02-07T13:47:56.837

The one construct to avoid in most cases is ~~NOT IN~~, especially ~~NOT IN (subquery)~~.
~~NOT attribute = ANY (subquery)~~ is equally inferior.

Those yield "surprising" results when involving NULL values, and there is almost always a superior (correct, faster, less deceiving) formulation with NOT EXISTS or LEFT JOIN / IS NULL. See:

Select rows which are not present in other table

Either way, DISTINCT is useless noise inside an EXISTS expression. Postgres stops evaluation after the first find anyway.
You can probably simplify to this:

SELECT DISTINCT U.no
FROM   U
JOIN   PUF ON PUF.noU = U.no
WHERE  NOT EXISTS (
   SELECT FROM F
   WHERE  F.no = PUF.noF
   AND    F.VILLE <> 'Lausanne'
   );

Maybe you can simplify further. The JOIN may not be needed, either. I would have to know exact table definitions and the exact intention of the query to be sure.

That said, the proof of the pudding is in the eating. After verifying that all variants are indeed equivalent (NULL values?), test performance for each (properly) and compare. See:

Get execution time of PostgreSQL query

NOT EXISTS vs NOT attribute= ANY vs NOT IN

1 Answers1