1

Question: What should I take into account when make performance comparison of those two queries?

Queries:

SELECT id 
FROM table 
WHERE id = (
    SELECT table_id_fk 
    FROM another_table 
    WHERE id = @id
)

vs

SELECT id 
FROM table t 
JOIN another_table at
ON t.id = at.table_id_fk
WHERE at.id = @id

Actually they do the same but in other ways.

P.S. It's not enough to just launch it on my server and look at response time. I want to understand the difference and understand what happens when, for instance, my db will grow up.

VB_
  • 45,112
  • 42
  • 145
  • 293
  • 3
    _"Can you compare performance"_ no, you have to compare it yourself. We don't have your server and database and we even don't know which indexes you've created. – Tim Schmelter Jun 05 '14 at 08:59
  • @TimSchmelter sorry, I just mean that you provide me reasoning when one may be better than another. Of course I can check it on my server. But I want to understand the difference between two queries. Moreover when my DB grow up, the performance characteristics may change. – VB_ Jun 05 '14 at 09:01
  • 1
    @V_B check this link may helpful to you http://stackoverflow.com/questions/2577174/join-vs-sub-query – Sadikhasan Jun 05 '14 at 09:02
  • 1
    @Sadikhasan thanks, useful reference – VB_ Jun 05 '14 at 09:03
  • You should learn how to explain queries. That would allow you to answer questions like this for yourself. In this case though, I suspect that the join would be quicker. – Philip Sheard Jun 05 '14 at 09:07
  • If you really want to understand `WHAT` (the internal) the behavior is you should take a look at the execution plans taken for both queries. – DrCopyPaste Jun 05 '14 at 09:07
  • @V_B: afaik in MySql a Join wins in most cases, but other dbms can optimize sub-queries in a way that you should use the most readable/logically correct way. – Tim Schmelter Jun 05 '14 at 09:07
  • In this case the example is so simple that I suspect MySQL would manage to optimise the first into the 2nd (assuming that which id is returned by the select is specified). – Kickstart Jun 05 '14 at 09:09
  • @Kickstart Hm, maybe - but I'm not so sure. – Strawberry Jun 05 '14 at 09:11
  • Just tried and sql fiddle ( http://www.sqlfiddle.com/#!2/7696c/1 ) and the explains are different. – Kickstart Jun 05 '14 at 09:18
  • 1
    http://www.exacthelp.com/2012/10/subquery-vs-inner-join-which-one-is.html – Mr X Jun 05 '14 at 10:29
  • Easy to test - MySQL Workbench check Duration: join was 4x as fast – Wolfi Sep 04 '17 at 21:21

3 Answers3

2

Performance is not always the main option when writing SQL queries. I would choose the first query for readability. "Give me the record of table that is referenced by the record of another_table with @id". This is straight-forward and easy to read. As to performance: You access one record by primary key to access another record (in another table) by Primary key. This can hardly get any faster.

The second statement joins both tables to get to the id (the selected id lacks the qualifier t by the way). So it does the same, but is not obvious at first glance. "Join both tables, but limit this to the another_table record with @id and give me table's id". This means the same, but gives the dbms the freedom to choose how to execute it. It could for instance join all records first and then remove all where @id doesn't match. However a good dbms won't do this; it will create the same execution plan as for statement 1.

Good dbms detect situations like these, they re-write qwueries internally, find out that the queries mean the same and come to the same execution plan. This is getting better and better, but it doesn't always work perfectly. So sometimes it does matter how to write the statement. When statements become more complex, then sometimes the second syntax to join everything and filter what you need leads to better execution plans. Unfortunately. So you often have to decide between readability and performance. I usually write my queries as readable as possible and change them only when it comes to performance problems.

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • Kettener How do you think: MySQL is good dbms? – VB_ Jun 05 '14 at 09:27
  • While I have sympathy with writing for readability, I can't say I find the first query more readable. To me it is less readable. Playing on sqlfiddle it seems that MySQL doesn't manage to come up with the same execution plan (for some strange reason it seems to have decided that the first ones sub query is regared as `UNCACHEABLE SUBQUERY`) – Kickstart Jun 05 '14 at 09:30
  • @V_B: Well, good question ;-) I don't know, I haven't worked with MySQL myself. I would think it's a good dbms, but I've also read that its query optimizer sometimes has problems to find the perfect execution plan, especially with subqueries. Quick googleing gave me this: http://www.iheavy.com/2013/06/26/mysql-subquery-optimization/. Subquery optimization really seems to be a weakness of MySQL. Kickstart's examinations confirm this. – Thorsten Kettner Jun 05 '14 at 09:33
  • @Kickstart: As to readability, at least one sees at first glance that query 1 tries to select exactly 1 record `where id = ...`. I don't think that can be any more readable. But, well, it's probably what one is used to. If MySQL doesn't handle subqueries well, you avoid them day after day and you are used to reading joins rather than subqueries. In Oracle, that I am used to, you even write IN clauses on tuples, which is very convenient. As to MySQL's query optimizer: They hopefully know their problems by now and are in the progress of enhancing it. – Thorsten Kettner Jun 05 '14 at 10:20
  • Some of the problems are being fixed (there is a link from the page you linked to giving details of improvements from MySQL 5.6). You can happily use `IN (subquery)` in MySQL (although this is poorly optimised for large results from the sub query) and this is one of the things that (to me) makes it less readable. To me with results I am thinking of overlapping sets with common or uncommon groupings being returned, while the sub query in the while feels like deviating from sets. But I agree it is largely what you are used to. – Kickstart Jun 05 '14 at 10:36
1

According to me, you just want to ask which is better. A Co-related sub-query or a SQL Join. Here is the explanation:-

A "correlated subquery" (i.e., one in which the where condition depends on values obtained from the rows of the containing query) will execute once for each row. A non-correlated subquery or sql join (one in which the where condition is independent of the containing query) will execute once at the beginning. The SQL engine makes this distinction automatically.

So according to me join will give fastre result than the co-related subquery. But on real ground you have to check the performance of these queries on your system as well for actual result.

Ankit Bajpai
  • 13,128
  • 4
  • 25
  • 40
  • It doesn't appear to be a correlated sub query. It does not depend on the outer query. – Kickstart Jun 05 '14 at 09:12
  • But at the whole definitely joins will perform faster than the sub query. – Ankit Bajpai Jun 05 '14 at 09:13
  • @AnkitBajpai yes, the `id` value is constant. But thanks for that information, I didn't know that – VB_ Jun 05 '14 at 09:14
  • The reason joins normally perform faster than a correlated sub query, is that with a correlated sub query the query has to get all the rows from the main query, then for each row it has to perform the sub query. With a join it can just do a query once. Even a query joining against a non correlated sub query it can just perform the sub query once to cover all rows. However there are added issues. If joining against a sub query that brings back many records most of which are ignored then joining against its results can be a large overhead. – Kickstart Jun 05 '14 at 09:21
1

With a minor bit of investigation.

I have sympathy with the point above that readability is important, although I find the join to be readable while sub queries I find less readable (although in this case the sub query is quite simple so not a major issue either way).

Normally I would hope that MySQL would manage to optimise a non correlated sub query away and execute it just as efficiently as if it were a join. This sub query at first glance does appear to be non correlated (ie, the results of it do not depend on the containing query).

However playing on SQL fiddle this doesn't appear to be the case:-

http://www.sqlfiddle.com/#!2/7696c/2

Using the sub query the explain says it is an UNCACHEABLE SUBQUERY which from the manual is :-

A subquery for which the result cannot be cached and must be re-evaluated for each row of the outer query

Doing much the same sub query by specifying the value rather than passing it in as a variable gives a different explain and just describes it as a SUBQUERY . This I suspect is just as efficient as the join.

My feeling is that MySQL is confused by the use of the variable, and has planned the query on the assumption that the value of the variable can change between rows. Hence it needs to re execute the sub query for every row. It hasn't managed to recognise that there is nothing in the query that modifies the value of the variable.

If you want to try yourself here are the details to set up the test:-

CREATE TABLE `table`
(
    id INT,
  PRIMARY KEY id(id)
);

CREATE TABLE another_table
(
    id INT,
    table_id_fk INT,
    PRIMARY KEY id (id),
    INDEX table_id_fk (table_id_fk)
);

INSERT INTO `table`
VALUES
(1),
(2),
(3),
(4),
(5),
(6),
(7),
(8);

INSERT INTO another_table
VALUES
(11,1),
(12,3),
(13,5),
(14,7),
(15,9),
(16,11),
(17,13),
(18,15);

SQL to execute:-

SET @id:=13;

SELECT t.id 
FROM `table` t
WHERE id = (
    SELECT table_id_fk 
    FROM another_table 
    WHERE id = @id
);



SELECT t.id 
FROM `table` t 
JOIN another_table at
ON t.id = at.table_id_fk
WHERE at.id = @id;


SELECT t.id 
FROM `table` t
WHERE id = (
    SELECT table_id_fk 
    FROM another_table 
    WHERE id = 13
);

EXPLAIN SELECT t.id 
FROM `table` t
WHERE id = (
    SELECT table_id_fk 
    FROM another_table 
    WHERE id = @id
);



EXPLAIN SELECT t.id 
FROM `table` t 
JOIN another_table at
ON t.id = at.table_id_fk
WHERE at.id = @id;


EXPLAIN SELECT t.id 
FROM `table` t
WHERE id = (
    SELECT table_id_fk 
    FROM another_table 
    WHERE id = 13
);

Explain results:-

ID  SELECT_TYPE     TABLE   TYPE    POSSIBLE_KEYS   KEY     KEY_LEN     REF     ROWS    EXTRA
1   PRIMARY     t   index   (null)  PRIMARY     4   (null)  8   Using where; Using index
2   UNCACHEABLE SUBQUERY    another_table   const   PRIMARY     PRIMARY     4   const   1   

ID  SELECT_TYPE     TABLE   TYPE    POSSIBLE_KEYS   KEY     KEY_LEN     REF     ROWS    EXTRA
1   SIMPLE  at  const   PRIMARY,table_id_fk     PRIMARY     4   const   1   
1   SIMPLE  t   const   PRIMARY     PRIMARY     4   const   1   Using index

ID  SELECT_TYPE     TABLE   TYPE    POSSIBLE_KEYS   KEY     KEY_LEN     REF     ROWS    EXTRA
1   PRIMARY     t   const   PRIMARY     PRIMARY     4   const   1   Using index
2   SUBQUERY    another_table   const   PRIMARY     PRIMARY     4       1   
Kickstart
  • 21,403
  • 2
  • 21
  • 33