4

could you please tell me which one of these two sentences is faster?

1st sentence:

SELECT DISTINCT(t1.user_id) FROM table_users as t1 WHERE t1.user_town IN (SELECT t2.town FROM table_towns as t2)

2nd sentence:

SELECT DISTINCT(t1.user_id) FROM table_users as t1 
  INNER JOIN (SELECT t2.town FROM table_towns as t2) as t3 ON t1.user_town = t3.town

The Selects I use are usually a bit more complex. And when possible I try to use IN (...) because I can pass a value or a table select.

Seeker
  • 365
  • 1
  • 6
  • 16
  • I've changed the result from SELECT t1.* to SELECT DISTINCT(t1.user_id) to avoid different results. Since it is usually what I really care – Seeker Oct 25 '11 at 17:29

5 Answers5

2

They both look fairly similar to my eyes. I'm pretty sure that a single SELECT will provide better performance, given that MySQL is not particularly good with subqueries:

SELECT t1.*
FROM table_users t1 
INNER JOIN table_towns t2 ON t1.user_town = t2.town

In any case, you should see what the EXPLAIN plan for the query has to say and perform some benchmarking on live data.

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
0

You should measure it, and issue an explain plan to be sure. But I would use neither of those:

select u.* from table_users as u
inner join table_towns as t on u.user_town = t.town
JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
0

IN is killing for MySQL. Rather use EXISTS.

INNER JOIN can be pretty fast, but the results are not necessarily the same. You may need grouping or distinct to get the same (or similar) results, though in this case I think you're safe.

If using an inner join, don't make it a subselect, because MySQL doesn't handle those well either. Just join on the table_towns table itself.

GolezTrol
  • 114,394
  • 18
  • 182
  • 210
  • That's no longer true for the lastest MySQL versions. `IN (select ....)` has been optimized. – Johan Oct 25 '11 at 18:24
  • Maybe not killing anymore, but still much slower in comparison. – GolezTrol Oct 26 '11 at 05:57
  • @Johan - What version is this effective from? Would be interested to hear (**from actual testing not just what you believe to be true!**) whether the huge disparity between `=` and `IN` that [is apparent here](http://stackoverflow.com/questions/3416076/this-select-query-takes-180-seconds-to-finish/3417098#3417098) is fixed. Mark Byer's answer I link to in the comments there says this isn't scheduled to be fixed until version 6.0 – Martin Smith Oct 26 '11 at 17:51
0

There's no definite answer to your question. It really depends on what are the indexes on the tables and whether the DBMS you are using can use them or not. I've had instances where an IN reduces the execution time compared to INNER JOIN and instances where INNER JOIN is faster.

Anecdotal example (using SQL Server, though):

I was joining a temp table to a regular table using a common field. The execution plan showed a HASH JOIN between the two tables when I looked at the execution plan. Execution time was 2 secs aprox. I changed the INNER JOIN to IN with a subselect ,the HASH JOIN disappeared and the execution time was cut to 1 sec aprox. It was now using the index on the main table.

Icarus
  • 63,293
  • 14
  • 100
  • 115
0

The 2 queries you posted may produce different results (second with JOIN can return more rows than the first), so I think it's not correct to compare performance.

a1ex07
  • 36,826
  • 12
  • 90
  • 103
  • you're right. I've just put distinct of only 1 field to avoid different results. – Seeker Oct 25 '11 at 17:33
  • In this case I believe the second(with JOIN) may be a bit worse (if optimizer is not smart enough to move ,`DISTINCT` to the subquery – a1ex07 Oct 25 '11 at 18:00