0

I'm debugging the following SQL statement, trying to understand how it's behaving.

I'm surprised to find that if I change the NOT EXISTS to just EXISTS (and querying against the same, unchanged data), I get the exact same output (which is a count of the rows, e.g., 237). How can this be?

I expected that changing the NOT EXISTS to just EXISTS would change it from returning a positive number of rows (e.g., 237) to returning 0.

SELECT count(*) FROM blog_tags
WHERE blog_tags.subscribed = true
AND blog_tags.special = true
AND EXISTS (
    SELECT 1
    FROM tags
    INNER JOIN blog_tags AS bt ON bt.tag_id = tags.id
    INNER JOIN blogs ON bt.blog_id = blogs.id
    WHERE blogs.org_id = 22
    AND NOT EXISTS ( /* Removing the "NOT" on this line has no effect */
        SELECT 1
        FROM blog_tags
        INNER JOIN tags AS tg ON blog_tags.tag_id = tg.id
        INNER JOIN blogs AS t ON blog_tags.blog_id = t.id
        WHERE t.org_id = 4
        AND t.active = true
        AND t.type = 'foo'
        AND t.priority_id = blogs.priority_id
        AND tg.name = tags.name
    )
);

I'm wondering if I'm conceptually understanding this incorrectly. Re-writing it as psuedo-code:

/* select_1 */
SELECT count(*) FROM sometable_1
WHERE condition_1a
AND condition_1b
AND EXISTS (
    /* condition_1c (the entire EXISTS inside these parentheses) */
    /* select_2 */
    SELECT 1
    FROM sometable2
    INNER JOIN join_expression_1a
    INNER JOIN join_expression_1b
    WHERE condition_2a
    AND NOT EXISTS ( /* Removing the "NOT" on this line has no effect */
        /* condition_2b (the entire NOT EXISTS inside these parentheses */
        /* select_3 */
        SELECT 1
        FROM sometable1
        INNER JOIN join_expression_2a
        INNER JOIN join_expression_2b
        WHERE condition_3a
        AND condition_3b
        AND condition_3c
        AND condition_3d
        AND condition_3e
    )
);

Following are my interpretations of the above psuedo-code. Are these interpretations true?

  1. count(*) can only return a non-zero number of rows if (condition_1a AND condition_1b AND condition_1c) are True
  2. condition_1c is only True if (condition_2a=True AND condition_2b=False)
  3. if the entire expression returns a non-zero number of rows, then condition_2b must be False in order for the NOT EXISTS to be True.
  4. if the entire expression returns a non-zero number of rows, then changing NOT EXISTS to EXISTS should cause the entire expression to return 0.

I'm using PostgreSQL v9.2.8

Rob Bednark
  • 25,981
  • 23
  • 80
  • 125
  • Can it be coincidence? There might be the same number of qualifying rows either way. – Erwin Brandstetter Aug 12 '14 at 23:36
  • 3
    A small amount of sample data to reproduce the problem would be interesting and helpful. Are you certain that the join conditions or the other part of the WHERE clause aren't behind the problem? – mu is too short Aug 12 '14 at 23:44
  • My guess is that `org_id` is never 22. – Gordon Linoff Aug 13 '14 at 00:24
  • 1
    Did you try running just the outer subquery on it's own, using both exists and not exists? Did you get the same number of rows each time? – Dan Bracuk Aug 13 '14 at 00:49
  • @GordonLinoff: If `org_id` is never 22, the count is 0, not 237. – Erwin Brandstetter Aug 13 '14 at 03:29
  • Agree with @DanBracuk, change both the "SELECT 1" to "SELECT *" and see what actual data comes back. The fields that are matched between SELECTs, ie blogs.priority_id and tags.name probably have data that matches both way. – simo.3792 Aug 13 '14 at 03:30
  • I have now modified the post and added psuedo-code and 4 of my interpretations of what the code is doing. Please check my interpretations and indicate whether they are correct/incorrect. I'm working on mu_is_too_short's suggestion of reproducing with a small amount of sample data, and will also try @Dan_Bracuk's suggestion of doing just the outer subquery on it's own. – Rob Bednark Aug 13 '14 at 04:07
  • The entire query returns `274256` (not 237 as I gave for an example). The outer subquery returns `87` with `NOT EXISTS` and `274259` for `EXISTS`. – Rob Bednark Aug 13 '14 at 04:59

2 Answers2

2

As to your "interpretations" added in the question update:

1. count(*) can only return a non-zero number of rows if (condition_1a AND condition_1b AND condition_1c) are True

count(*) never returns NULL, but returns zero (0), when now rows are found. This makes it special among standard aggregate functions. Per documentation:

It should be noted that except for count, these functions return a null value when no rows are selected.

You probably mean:

count(*) can only return a non-zero number of rows

But you are also fuzzy on the sequence of events. WHERE and JOIN conditions are evaluated for each individual input row. The aggregate function count(*) is evaluated after that. Consider the sequence of events in a SELECT query:

A correct sentence would be:

count(*) can only return a non-zero number if (condition_1a AND condition_1b AND condition_1c) evaluate to TRUE for one or more of the input rows.

2. condition_1c is only True if (condition_2a=True AND condition_2b=False)

Correct.

3. if the entire expression returns a non-zero number of rows, then condition_2b must be False in order for the NOT EXISTS to be True.

See 1. Also, if your EXISTS expression is not constant (referencing a column of the outer query or calling any volatile function) the result of the EXISTS expression can be different for each input row.

4. if the entire expression returns a non-zero number of rows, then changing NOT EXISTS to EXISTS should cause the entire expression to return 0.

Incorrect - if the EXISTS expression is not constant. See 3. Changing NOT EXISTS to EXISTS can result in any number of rows.

Given that you are building on incorrect assumptions, I suggest you reassess your findings and come back with a SSCCE if you can.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
2
...
    AND NOT EXISTS ( /* Removing the "NOT" on this line has no effect */
        /* condition_2b (the entire NOT EXISTS inside these parentheses */
        /* select_3 */
        SELECT 1
        FROM sometable1
        INNER JOIN join_expression_2a
        INNER JOIN join_expression_2b
        WHERE condition_3a
        AND condition_3b
        AND condition_3c
        AND condition_3d    --- this condition links select_2 to select_3
        AND condition_3e    --- this condition links select_2 to select_3
    )
);

condition_3d and condition_3e link select_2 with select_3, but it is a pretty loose coupling as the priority_id and name might be linked to different blogs and tags respectively. Without seeing the actual data, I suggest that there probably needs to be a closer linking between select_2 and select_3 by specifying select_2.blog_id = select_3.blog_id (or similar).

In regards to you pseudo code, I take the following interpretation of what the code says:

  1. If there is a tag from org=22 and there IS NOT A blog_tag from org=4 where an active 'foo' blog has the same priority and the tag has the same name.
  2. If there is a tag from org=22 and there IS A blog_tag from org=4 where an active 'foo' blog has the same priority and the tag has the same name.

In case 1. Select_2 will return a bunch of rows where the combination of tags, blogs and blog_tag EXCLUDE the matching condition. Which might be tags a, b, d & f for instance.

In case 2. Select_2 will return a bunch of rows where the combination of tags, blogs and blog_tag INCLUDE the matching condition. Which might be tags c, e, g, k for instance.

Either way Select_2 has found something, which is all it needs to return all the results.

NOTE: the aliasing between queries is very ad hoc and hard to see where the specific tables are used within each query. blog_tags is not aliased in either select_1 or select_3, blogs is aliased to t. I suggest always using the same acronym for a table in all instances within a query (ie blog_tags is always bt) and then append a number for each instance (ie bt1, bt2 etc). As follows:

SELECT count(*) FROM blog_tags AS bt  -- add alias
WHERE bt.subscribed = true
AND bt.special = true
AND EXISTS (
    SELECT 1
    FROM tags AS t1   -- add alias
    INNER JOIN blog_tags AS bt1 ON bt1.tag_id = t1.id -- change alias
    INNER JOIN blogs AS b1 ON bt1.blog_id = b1.id  -- change alias
    WHERE b1.org_id = 22
    AND NOT EXISTS ( /* Removing the "NOT" on this line has no effect */
        SELECT 1
        FROM blog_tags AS bt2 -- change alias
        INNER JOIN tags AS t2 ON bt2.tag_id = t2.id -- change alias
        INNER JOIN blogs AS b2 ON bt2.blog_id = b2.id -- change alias
        WHERE b2.org_id = 4
        AND b2.active = true
        AND b2.type = 'foo'
        AND b2.priority_id = b1.priority_id
        AND t2.name = t1.name
    )
);
simo.3792
  • 2,102
  • 1
  • 17
  • 29