I think you are misunderstanding how the correlated subquery works together with EXISTS
.
An equal comparison (=
) evaluates a particular value against another particular value and returns true or false. It can't evaluate multiple values, unless you add additional logical operators AND
/ OR
. The IN
operator is just a way to simplify a bunch of OR
with =
operators so it's easier to read as Zohar explained in another answer.
In the other hand, the EXISTS
operator uses an left semi join to check for the existence of a particular record. EXISTS
is used in boolean contexts whenever you want to check if a particular row exists or not. The SQL engine stops searching for matching rows as soon as it finds the first one. This is the left semi join's purpose and one of the differences with left outer join (other than retrieving the joining table's data and matching rows amounts).
So when you write:
FROM
sales.customers c
WHERE
EXISTS (
SELECT *
FROM sales.orders o
WHERE o.customer_id = c.customer_id
)
You are using a correlated sub-query linking customers
with orders
. The subquery is used in the context of an EXISTS
operator, this will make the engine search, for each row of sales.customers
, if there is at least 1 row in sales.orders
that satisfies this condition:
WHERE o.customer_id = c.customer_id
This condition will result as false
on each order that's not from the customer we are currently checking. The engine will ignore these rows because we are looking for existence. Only the rows from customers
which have a customer_id
that produces a row in the subquery will be returned.
If we change the condition to an IN
:
FROM
sales.customers c
WHERE
EXISTS (
SELECT *
FROM sales.orders o
WHERE o.customer_id IN (c.customer_id)
)
The subquery will check for existence on table sales.orders
that satisfies the condition:
WHERE o.customer_id IN (c.customer_id)
This happens to be the same c.customer_id
that we referenced in the =
example. The behaviour for the engine will be the same as the previous example; check if there is at least 1 row on orders
that match the customer_id
from customers
.
Hence, both =
and IN
will work the same way.