2

What is a differenct in this SQL:

first one:

select * 
  from table_1 a 
  join table_2 b 
    on a.id = b.acc_id

second:

select * 
  from (select * from table_1) a 
  join (select * from table_2) b 
    on a.id = b.acc_id

Because the first one is being executed nearly 40+ minutes and the second one within seconds...

I'm really confused.

Maybe MySQL DB configuration is broken ?

Both tables are InnoDB and hosted on Domo

WBAR
  • 4,924
  • 7
  • 47
  • 81

1 Answers1

3

I'm very surprised that you report the query without subqueries is taking so long. I would expect the opposite. Are you sure you observed that correctly?

MySQL is capable of rewriting some subqueries like the one you show, if the subquery is simple enough. So there should be no difference.

I suspect the case you're really asking about involves a more complex query.

When you use a subquery as a table, MySQL may need to create a temporary table to store the result of the subquery, then use that temporary table for subsequent joins or searching or sorting.

Temporary tables create overhead for the query, because they need to be stored. If they're small, the temp table may reside in RAM. But if the amount of data is too great, it'll copy the table to disk in the directory named in your tmpdir config option.

You should also gather some information about how MySQL is going to run your query:

EXPLAIN select * 
  from (select * from table_1) a 
  join (select * from table_2) b 
    on a.id = b.acc_id\G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: b
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: a
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY,id
          key: PRIMARY
      key_len: 8
          ref: test.b.acc_id
         rows: 1
     filtered: 100.00
        Extra: NULL

In this simple example, the EXPLAIN report is exactly the same as if we ran this equivalent query:

EXPLAIN select * 
  from table_1 a 
  join table_2 b 
    on a.id = b.acc_id\G

At least as I test this on MySQL 8.0.0-dmr. Older versions of MySQL may not support that optimization.

But again, I suspect the real case you're testing involves more complex subqueries.

You should also make sure the tables have the right indexes to allow the join to do an index lookup for the join. In the EXPLAIN report, you should see the second table report "type: ref" or "type: eq_ref".

When asking SQL questions, it would help if you post the actual query that gave you the 40 minute query time. And also run SHOW CREATE TABLE for each table in the join so we can see what indexes and constraints you have in each table.


Update: I ran the same EXPLAIN report on MySQL 5.6.33, and we can see the derived tables created from the subqueries:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived3>
         type: ref
possible_keys: <auto_key0>
          key: <auto_key0>
      key_len: 9
          ref: a.id
         rows: 2
        Extra: NULL
*************************** 3. row ***************************
           id: 3
  select_type: DERIVED
        table: table_2
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
        Extra: NULL
*************************** 4. row ***************************
           id: 2
  select_type: DERIVED
        table: table_1
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
        Extra: NULL
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 1
    Older versions of MySQL are going to force the inline views to be materialized as derived tables, before the outer query runs. And I too would expect the version with the derived tables to take longer to execute. But I'd also throw out the timing for the "first run" of each statement. I don't want to compare execution times of "cold cache disk io" and "warmed cache". And I don't watn the MySQL query cache to skew results, so if that's enabled, I'd only want to compare execution times of `SELECT SQL_NO_CACHE` ... +10 – spencer7593 Dec 16 '16 at 00:31