Use of IN and EXISTS in SQL

Question

Assuming that one has three Tables in a Relational Database as :

Customer(Id, Name, City),
Product(Id, Name, Price),
Orders(Cust_Id, Prod_Id, Date)

My first question is what is the best way to excecute the query: "Get all the Customers who ordered a Product". Some people propose the query with EXISTS as:

Select *
From Customer c
Where Exists (Select Cust_Id from Orders o where c.Id=o.cust_Id)

Is the above query equivalent (can it be written?) as:

 Select *
 From Customer
 Where Exists (select Cust_id from Orders o Join Customer c on c.Id=o.cust_Id)

What is the problem when we use IN instead of EXISTS apart from the performance as:

Select *
From Customer
Where Customer.Id IN (Select o.cust_Id from Order o )

Do the three above queries return exactly the same records?

Update: How does really the EXISTS evaluation works in the second query (or the first), considering that it checks only if the Subquery returns true or false? What is the "interpretation" of the query i.e.?

Select *
From Customer c
Where Exists (True)

Already answered http://stackoverflow.com/questions/2065329/sql-server-in-vs-exists-performance — Declan_K, May 14 '13 at 15:10
Yes, the three queries should return the same results. Where exists is generally faster than in. — Esoteric Screen Name, May 14 '13 at 15:12
Implementation dependant. You should specify RDBMS if asking about performance. SQL is declarative and it depends on the quirks of the optimiser. — Martin Smith, May 14 '13 at 15:14
As @MartinSmith says. Efficiency is highly dependent on the DBMS and the version (of the optimizer) and even table sizes. In MySQL, I'd choose the `EXISTS` vesrion (Q1). But with small customer table and huge order table, the `IN` might be faster. — ypercubeᵀᴹ, May 14 '13 at 15:37

score 3 · Accepted Answer · answered May 14 '13 at 15:14

3

The first two queries are different.

The first has a correlated subquery and will return what you want -- information about customers who have an order.

The second has an uncorrelated subquery. It will return either all customers or no customers, depending on whether or not any customers have placed an order.

The third query is an alternative way of expressing what you want.

The only possible issue that I can think of would arise when cust_id might have NULL values. In such a case, the first and third queries may not return the same results.

answered May 14 '13 at 15:14

Gordon Linoff

1,242,037
58
646
786

1

`IN` and `EXISTS` behave the same in the presence of `NULL`s. It is the `NOT` versions that differ. – Martin Smith May 14 '13 at 15:16
@Thomas that's DB-dependant. It's not true in MSSql, nor MySQL – Matthew May 14 '13 at 15:50
2

@MartinSmith - Wait, that is correct. SQL Server is doing it correctly because one of the expressions returns true. Not enough coffee this morning. – Thomas May 14 '13 at 15:59

score 2 · Answer 2 · edited Jun 10 '16 at 16:20

2

~~Yes, each of those three should return identical result sets.~~ Your second query is incorrect, as @ypercube points out in the commends. You're checking whether an uncorrellated subquery EXISTS

Of the two that work (1, 3), I'd expect #3 to be the fastest depending on your tables because it only executes the subquery one time.

However your most effective result is probably none of them but this:

SELECT DISTINCT
    c.*
FROM
    Customer c
JOIN
    Orders o
    ON o.[cust_id] = c.[Id]

because it should just be an index scan and a hash.

You should check the query plans and/or benchmark each one.

edited Jun 10 '16 at 16:20

marc_s

732,580
175
1,330
1,459

answered May 14 '13 at 15:13

Matthew

10,244
5
49
104

No, the second query is not equivalent. It's totally different. – ypercubeᵀᴹ May 14 '13 at 15:29
@ypercube not sure how... seems to be the same to me... but I'm willing to yield to your expertise on it. – Matthew May 14 '13 at 15:31
1

It is uncorrelated. It will either return all customers (if one or more orders exist) or no customer at all. – ypercubeᵀᴹ May 14 '13 at 15:32

CharlesC · Answer 3 · 2013-05-14T15:38:39.263

0

The best way to execute that query is to add orders to the from clause and join to it.

select distinct c.* 
from customers c, 
orders o
where c.id = o.cust_id

Your other queries may be more inefficient (depending on the shape of the data) but they should all return the same result set.

edited May 14 '13 at 15:38

answered May 14 '13 at 15:14

CharlesC

350
2
8

-1 An improperly formatted cross-join is probably **not** the best solution. – Matthew May 14 '13 at 15:14
@CharlesC, you should be able to edit your answer to correct it. – DOK May 14 '13 at 15:16
1

@Matthew - it is a valid form of the join - just not newer syntax. – Randy May 14 '13 at 15:19
@Randy it's not ANSI standard. It's a bad suggestion... and even if the syntax were ok it would still be wrong because it likely duplicates the customer records. – Matthew May 14 '13 at 15:20
2

@Matthew - agreed about the duplicates - but SQL92 standard is still very acceptable (imo) :) – Randy May 14 '13 at 15:22
2

@CharlesC none of his original queries will return duplicates. – Matthew May 14 '13 at 15:26
Oops you are correct. Edited my answer. Keeping the syntax because I love SQL92. – CharlesC May 14 '13 at 15:39
1

SQL92 stipulated `JOIN`s as I indicated. The cartestian join you proposed is old and non-standard. http://stackoverflow.com/questions/1599050/ansi-vs-non-ansi-sql-join-syntax – Matthew May 14 '13 at 15:48
1

@Randy While this is still (in SQL-2011+) valid syntax, it's not SQL-92. It's even older, SQL-89 (or older? not sure.) The `JOIN` was introduced in 1992. – ypercubeᵀᴹ May 14 '13 at 15:55

Use of IN and EXISTS in SQL

3 Answers3