4

Trying to understand NOT EXISTS better. Can we always replace NOT EXISTS when we have NOT IN, even with nested situation?

I found this similar question, it only has one NOT IN while trying to do with the nested case. We have two tables, registered and preActivity .

Registered has mId (string), aId (string), quarter (string), year (integer) and preActivity has aId (string), preAId (string) where

> mId is member id, 
> aId is the activity Id, 
> preAId is the prerequisite activity Id.

If we have this query with nested NOT IN to find out all the members have registered all the required activities(prerequisite) class before for activity (class) swimming at YMCA. Can we convert it with to two nested NOT EXIST?

SELECT DISTINCT r.mid
FROM registered r
WHERE r.mid NOT IN (SELECT r.mid
                    FROM preActivity p
                    WHERE p.aid = "swimming" AND 
                    p.preAId NOT IN (SELECT r2.mid
                                     FROM registered r2
                                     WHERE r2.mid = r.mid));

Using the hint for this post, we can convert one of the NOT IN, but the second one taking me hours. Can someone please help with some explanation ?

Here is what I have so far:

SELECT DISTINCT r.mid
FROM registered r
WHERE NOT EXISTS (SELECT r.mid
                  FROM preActivity p
                  WHERE p.aid = "swimming" AND 
                  p.preAId NOT IN (SELECT r2.mid  # how can we compare p.preAId with some rows selected from r2 Notice we don't have preAid field from resistered table (following the idea from the post? 
                                   FROM registered r2
                                   WHERE r2.mid = r.mid));

Or we can't apply the same idea here since it is a two nested case ?

Tomer Shetah
  • 8,413
  • 7
  • 27
  • 35
mmmm
  • 81
  • 7

1 Answers1

2

First thing to remember: the SELECT in an [NOT] EXISTS query doesn't matter, as we're only looking for the existence of rows. You could even write SELECT 1/0 and not get an error. So most people write [NOT] EXISTS (SELECT 1. (I like to put that all on one line and leave the rest of the subquery on new lines)

Secondly, a NOT IN query can have issues surrounding null columns, so it's best to always write a NOT EXISTS instead.


Now, if you analyze an [NOT] IN query, you will see that the semi-join is on the column just before with the column in the SELECT. So a query:

X.colA [NOT] IN
    (SELECT Y.colA FROM Y)

can always be converted to

[NOT] EXISTS (SELECT 1
    FROM Y
    WHERE Y.colA = X.colA)

Another interesting syntax, most useful with multi-column joins or nullable columns, is:

[NOT] EXISTS (
    SELECT X.colA
    INTERSECT
    SELECT Y.colA
    FROM Y)

Don't forget to always use the correct table alias on the subquery columns, if you get this wrong then your query can return incorrect results without you noticing.

For example, what happens here?

[NOT] EXISTS (SELECT 1
    FROM Y
    WHERE X.colA = colA)

In your case, your first NOT IN query is slightly weird.

You are putting r.mid on both sides of the join, so effectively this becomes an EXISTS anyway.

So your query can be rewritten as this:

select distinct r.mid
    from registered r
    where not exists (select 1
        From preActivity p
        where p.aid = "swimming" and 
        not exists (select 1
            From registered r2
            where r2.mid = r.mid and r2.mid = p.preAId
        )
    );
Charlieface
  • 52,284
  • 6
  • 19
  • 43