-1

Why use join since you can get the data from different table like using the subqueries below?

enter image description here

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Alice Xu
  • 533
  • 6
  • 20

2 Answers2

1

If you will use EXPLAIN then you will see the difference it makes when you are using sub queries and when you are using JOINS. JOINS are comparatively faster in most of the cases.

Here is a good article on JOINs vs Subqueries

In JOINs RDBMS can create an execution plan that is better for your query and can predict what data should be loaded to be processed and save time, unlike the sub-query where it will run all the queries and load all their data to do the processing.

See the test performed in the article:

enter image description here

I added 100,000 businesses and 150,000 messages (random types for random businesses).

Method 1: JOIN I believed for the longest time that JOINs were faster than subqueries so I first tried this method. (Note: I had to set SET SQL_BIG_SELECTS=1; just to even get the query to run… bad sign!).

SELECT SQL_NO_CACHE b.id, b.name
FROM `businesses` b
LEFT JOIN `business_messages` bm ON
    b.id = bm.business_id
    AND bm.type = 'action'
WHERE
    bm.type IS NULL
LIMIT 10;

Method 2: Subquery in WHERE This method was sent to me by my cousin. It’s a lot easier to read and understand what the query is trying to do:

SELECT SQL_NO_CACHE id, name 
FROM businesses b 
WHERE NOT EXISTS (
    SELECT 1 
    FROM business_messages 
    WHERE business_id = b.id 
        AND type = 'action'
) 
LIMIT 10;

Method 3: Subquery in JOIN This is an idea I had after seeing the subquery in action:

SELECT SQL_NO_CACHE id, name 
FROM businesses b 
LEFT JOIN (
    SELECT business_id, type 
    FROM business_messages 
    WHERE type = 'action' 
    GROUP BY business_id
) bm 
    ON bm.business_id = b.id 
WHERE bm.type IS NULL 
LIMIT 10;

Results

  Method 1    Method 2    Method 3
1 5.51s   3.91s   0.41s
2 6.60s   4.74s   0.36
3 7.68s   2.40s   0.64s
4 4.23s   2.63s   0.36s
5 6.71s   2.06s   0.30s
6 5.23s   2.01s   0.53s
7 4.03s   2.80s   0.43s
8 4.56s   2.44s   0.55s
9 7.13s   1.95s   0.80s
10    7.04s   1.38s   0.43s
Average   5.872s  2.632s  0.481s
Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
0

If you are going to use execution plan, you will see the difference. It is always best to use JOIN than using sub query when it comes to optimization, because sub query may take too much execution time especially when you are handling a large amount of data.

japzdivino
  • 1,736
  • 3
  • 17
  • 25