0

In SQL server, when using the WHERE EXISTS criteria, is there any performance difference of using an = vs an IN operator?

Example:

SELECT
    customer_id,
    first_name,
    last_name
FROM
    sales.customers c
WHERE
    EXISTS (
        SELECT *
        FROM sales.orders o
        WHERE customer_id = c.customer_id --we can also replace = with IN - for example customer_id IN c.customer_id or say c.customer_id IN customer_id
    )

Another example would be:

SELECT
    customer_id,
    first_name,
    last_name
FROM
    sales.customers c
WHERE
    EXISTS (
        SELECT *
        FROM #Customers x --say #Customers is temp table containing a couple of customers for which we want to show the output
        WHERE c.customer_id = x.customer_id --we can also replace = with IN - for example c.customer_id IN x.customer_id or say x.customer_id IN c.customer_id
    )

This question is marked as duplicate of this. That question is re. EXISTS vs IN. Whereas my question is re = vs IN when using EXISTS

variable
  • 8,262
  • 9
  • 95
  • 215
  • There is already a good thread on this here: https://stackoverflow.com/questions/24929/difference-between-exists-and-in-in-sql – jimmy8ball Jun 19 '19 at 09:41
  • Are you serious? Did you even read my question. I am not asking for difference between EXIST vs IN clause. I am asking for the difference between using = vs IN in an EXISTS clause! – variable Jun 19 '19 at 10:05
  • 1
    Did you try comparing execution plans? Both *should* be the same because your `IN` will only include the same column reference as your `=` and the `IN` is just a syntactic sugar for a bunch of `=` with `OR`s. – EzLo Jun 19 '19 at 10:21
  • Yes I think they are logically the same, but check the execution plan to be sure – Nick.Mc Jun 19 '19 at 10:22
  • @EzLo - but Im not using an OR! – variable Jun 19 '19 at 10:23
  • `Column IN (1, 2)` is the same as `Column = 1 OR Column = 2`. So if you only have 1 value in the `IN`... – EzLo Jun 19 '19 at 10:25
  • In my query, there is no OR; If I have 1 values then you are right; what if multiple values – variable Jun 19 '19 at 10:26
  • @variable if you have multiple values then you can't compare `=` against an `IN` unless you add `AND`s or `OR`s with the `=`. If you mean multiple values as in a column reference, a correlated subquery (which is this case) will execute once for each row of the outer set, `customer_id` will only hold 1 value "at a time". – EzLo Jun 19 '19 at 10:47
  • Agree with your last point - will hold 1 value at a time. One side of the = will hold one value. Where as the other side holds multiple values. Exists evaluates each row (one row at a time) from source table with multiple rows from the target table. So then is there any performance consideration when using = vs IN? – variable Jun 19 '19 at 10:51

2 Answers2

2

No difference whatsoever.

When you are writing

WHERE A IN(B)

It's exactly the same as writing

WHERE A = B

If either A or B is a column, expression or a constant doesn't change the nature of the IN operator.

If you where to write

WHERE A IN(B, C)

It would be the same as writing

WHERE A = B OR A = C

The fact that this condition is a part of a subquery inside an EXISTS operator is irrelevant.

Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
  • Your example is a = 1; how about when we have multiple values. Then = is not equivalent to IN. Unless there are OR's along with the = – variable Jun 19 '19 at 10:25
  • Yes, that's correct - but `where a in(b)` is the exact condition you have. it doesn't matter of `b` is a constant or a column name. – Zohar Peled Jun 19 '19 at 10:28
  • My original answer had the correct conclusion but for the wrong reasons, and only after I've read your comment I've realized it - hence the intensive edit. – Zohar Peled Jun 19 '19 at 10:33
1

I think you are misunderstanding how the correlated subquery works together with EXISTS.

An equal comparison (=) evaluates a particular value against another particular value and returns true or false. It can't evaluate multiple values, unless you add additional logical operators AND / OR. The IN operator is just a way to simplify a bunch of OR with = operators so it's easier to read as Zohar explained in another answer.

In the other hand, the EXISTS operator uses an left semi join to check for the existence of a particular record. EXISTS is used in boolean contexts whenever you want to check if a particular row exists or not. The SQL engine stops searching for matching rows as soon as it finds the first one. This is the left semi join's purpose and one of the differences with left outer join (other than retrieving the joining table's data and matching rows amounts).

So when you write:

FROM
    sales.customers c
WHERE
    EXISTS (
        SELECT *
        FROM sales.orders o
        WHERE o.customer_id = c.customer_id
    )

You are using a correlated sub-query linking customers with orders. The subquery is used in the context of an EXISTS operator, this will make the engine search, for each row of sales.customers, if there is at least 1 row in sales.orders that satisfies this condition:

WHERE o.customer_id = c.customer_id

This condition will result as false on each order that's not from the customer we are currently checking. The engine will ignore these rows because we are looking for existence. Only the rows from customers which have a customer_id that produces a row in the subquery will be returned.

If we change the condition to an IN:

FROM
    sales.customers c
WHERE
    EXISTS (
        SELECT *
        FROM sales.orders o
        WHERE o.customer_id IN (c.customer_id)
    )

The subquery will check for existence on table sales.orders that satisfies the condition:

WHERE o.customer_id IN (c.customer_id)

This happens to be the same c.customer_id that we referenced in the = example. The behaviour for the engine will be the same as the previous example; check if there is at least 1 row on orders that match the customer_id from customers.

Hence, both = and IN will work the same way.

EzLo
  • 13,780
  • 10
  • 33
  • 38
  • This is a good explanation on how the `exists` operator works, but it's irrelevant to how the `in` operator works - and the fact is, that when ever an `in` operator has only one value in the list, it's exactly the same as using the `=` operator. – Zohar Peled Jun 20 '19 at 06:47