Interesting observation for SQL WHERE EXISTS in subqueires

Question

I'd like to know if someone can explain why the following occurs:

I have the following Table which I made up. It has 2 columns "A" and "B":

If I run the following Query :-

SELECT * FROM 
    (SELECT A,B FROM Table_1) T1
WHERE EXISTS 
    (SELECT 'X' FROM Table_1 WHERE A = 3)

I get the entire table. This I understand cause the EXISTS clause checks if it finds a single row in a statement which it does

However, If I run the following query, I only get a portion of the table

SELECT * FROM 
    (SELECT A,B FROM Table_1) T1
WHERE EXISTS 
    (SELECT 'X' FROM Table_1 WHERE T1.A=3 OR T1.A=4)

The portional results are as follows

A     B
========
3,    13
4,    14

Can anyone explain why this happens?

You are probably confusing yourself by adding an extra `or` condition to the second query. That is not what is causing the behavior you see. It is the fact that you use `T1` as a prefix in the where clause that makes the difference. — Mikael Eriksson, Jan 13 '12 at 06:58

Borealid · Accepted Answer · 2012-01-13T05:27:30.150

Let's break down what you're actually asking the database server to do.

SELECT * FROM - get all fields. I'm going to ignore this bit from now on, since it's not important.

(SELECT A,B FROM Table_1) T1 - get all rows from the table, and name that result set "T1".

WHERE EXISTS (SELECT 'X' FROM Table_1 WHERE A = 3) - Select, INDEPENDENT of the above, all rows from the table. This happens once per row in the table - but always does the same thing, since T1 is not used in the subquery. If one of the rows of the table has A=3 (always the case), do no restriction. Otherwise, discard that particular T1 (this does not happen here).

WHERE EXISTS (SELECT 'X' FROM Table_1 WHERE T1.A=3 OR T1.A=4) - this one is tricky. You're making a second selection of all rows from the table, but you're restricting by this condition: T1.A=3 OR T1.A=4. This condition is based on the result of T1 - but that's not the whole table, just a particular row. When you say SELECT * FROM mytable WHERE mytable.A=3, you don't mean "select all rows of mytable if some row has A=3", you only mean to select those particular rows where that is true. So, the subquery SELECT 'X' FROM Table_1 WHERE T1.A=3 OR T1.A=4 contains either no rows or all the rows in Table_1, depending on which value is in T1.A. Because you use EXISTS, you get true for all rows and false for none.

That's why you get different results - your subquery is performed once per row in Table_1. In the first case, it always contains one row. In the second case, it contains either no rows or all nine, depending on the particular row of Table_1 contained in T1.

score 1 · Answer 2 · answered Jan 13 '12 at 05:18

It's probably easier if you remove some of the subqueries. Your queries essentially come down to the following:

Query 1: Select all records from Table_1 where true. This is because there are A fields in Table_1 that contain 3 or 4.

Query 2: Select all records from Table_1 where A equals 3 or A equals 4

Expressed in SQL, your queries can be simplified to

SELECT * FROM Table_1

and

SELECT * FROM Table_1
WHERE A IN (3,4)

score 1 · Answer 3 · answered Jan 13 '12 at 05:19

1

I believe it would be examining the current row of T1 to see if A=3 or A=4, similar to if you wrote:

SELECT * FROM 
  (SELECT A,B FROM Table_1) T1
WHERE A=3 OR A=4;

answered Jan 13 '12 at 05:19

Ilion

6,772
3
24
47

Larry Lustig · Answer 4 · 2012-01-13T14:35:59.513

Wow, cool.

You have a little extra SQL in there. What you're really doing is:

 SELECT * FROM Table_1 T1 WHERE EXISTS 
    (SELECT 'X' FROM Table_1 WHERE T1.A = 3 OR T1.A = 4)

So what's happening?

As you read each row of Table_1 as T1, the engine evaluates the sub-select in order to decide whether the row from T1 should be included in the result set.

On the first row, where T1.A is 1, then T1.A = 3 is FALSE and T1.A = 4 is FALSE, so the sub-select fails. That row is not included in the result set.

Same when A = 2. The row with A = 2 is not included.

But for the next two rows (T1.A of 3 and 4) at least one side of the OR evaluates to TRUE and the sub-select succeeds. So those two rows are included.

When you hit T1.A of 5, and for the rest of the table, the sub-select fails.

Oops, cut-and-paste error from the original question's SQL. Fixed now. — Larry Lustig, Jan 13 '12 at 14:35

score 1 · Answer 5 · edited May 23 '17 at 12:12

There are always multiple ways to write the same thing in SQL. For example this

SELECT * FROM 
    (SELECT A,B FROM Table_1) T1

...may be rewritten as this:

SELECT A, B
  FROM Table_1

The latter is simpler and I see no reason to prefer the former. Rewriting your first query accordingly

SELECT A, B
  FROM Table_1
 WHERE EXISTS (
               SELECT 'X' 
                 FROM Table_1 
                WHERE A = 3
              );

I've removed the correlation name T1 because it server no purpose. The subquery does not reference its 'outer' table expression so each appearance Table_1 does not need to be disambiguated.

I think you understand what is going on here: if one or more rows in Table_1 satisfy the search condition A = 3 then the entire table is returned, otherwise the empty set is returned. While it is a valid query it is not often a very useful construct.

For your second query, however, at least one correlation name is required because the subquery does reference its outer table:

SELECT A, B
  FROM Table_1 T1
 WHERE EXISTS (
               SELECT 'X' 
                 FROM Table_1 T2
                WHERE T1.A IN (3, 4)
              );

Again this is semantically equivalent to your second query. Note that I've given the apperance of Table_1 in the subquery the correlation name T2 but T2 does not appear in the subquery's WHERE clause. Because T2 is not being used, we can remove the subquery (hence the need for correlation names) entirely:

SELECT A, B
  FROM Table_1
 WHERE A IN (3, 4);

It's worth pointing out that the ability for a subquery to reference an 'outer' table expression is usually exploited as a 'correlated subquery' i.e. the search condition (WHERE clause) involves both 'inner' and 'outer' tables (your second query's search condition involves the 'outer' table only).

Using the usual parts and suppliers database, here is an example of a correlated subquery to implement a semijoin to find suppliers (S) who supply (SP) at least one part:

SELECT SNO, SNAME
  FROM S
 WHERE EXISTS (
               SELECT *
                 FROM SP
                WHERE SP.SNO = S.SNO
              );

Note the subquery's search condition relates the 'outer' table SP to the 'inner' table S. Also, the projection SELECT SNO, SNAME in the 'outer' table does not require the inclusion of the correlation name S because SP from the 'outer' table is not in scope for the 'inner' table.

Interesting observation for SQL WHERE EXISTS in subqueires

5 Answers5