2

I have a query which gets data by joining 3 big tables (~1mm records each), in addition they are very busy tables.

is it better to do the traditional joins? or rather first fetch values from first table and do a secondary query passing the values retrieved as in comma delimited in clause?

Option #1

SELECT * 
FROM BigTable1 a
INNER JOIN BigTable2 b using(someField2)
INNER JOIN BigTable3 c using(someField3)
WHERE a.someField1 = 'value'

vs

Option #2

$values = SELECT someField2 FROM WHERE someField1 = 'value'; #(~20-200 values)

SELECT * 
FROM BigTable2
INNER JOIN BigTable3 c using(someField1)
WHERE someField2 in ($values)

Option #3

create temp-table to store these values from BigTable1 and use this instead of join to BigTable1 directly

any other option?

Chaim Klar
  • 190
  • 1
  • 6
  • what unit of measure is mm? Million? Multi-million? thanks... – Menelaos Oct 23 '15 at 12:59
  • Depends on the size of the rows and cardinality. – Jimmy T. Oct 23 '15 at 13:01
  • You can also do a join against subquery results but mysql's query optimizer may or may not like that more. Use `EXPLAIN` on your queries and see which one looks like it has better performance. Something like: `SELECT * FROM table1 INNER JOIN (SELECT somefield2 FROM table2 WHERE somefield1 = 'value') as b on table1.f1 = b.f2` – Josh J Oct 23 '15 at 13:15
  • thank you all, so i see no clear answer... :( maythesource.com each table has ~1 million records, and the first clause (the list of values) can range between 20-200. – Chaim Klar Oct 23 '15 at 14:08

2 Answers2

0

I think the best option is to try both approaches and run explain on them. Finally, one optimization you could make would be to use a stored procedure for the second approach which would reduce the time/overhead of having to run 2 queries from the client.

Finally, Joining is quite an expensive operation for very large tables since your essentially projecting and selecting over 1m X 1m rows. ( terms: What are projection and selection?)

Community
  • 1
  • 1
Menelaos
  • 23,508
  • 18
  • 90
  • 155
  • @Chaim Klar It is likely that if you set appropriate keys and indexes in your table, mysql will do a much better job than expecting in running the JOIN query. Have a look @ http://stackoverflow.com/questions/173726/when-and-why-are-database-joins-expensive – Menelaos Oct 23 '15 at 13:07
0

There is no definitive answer to your question and you could profile both ways since they depend on multiple factors.

However, the first approach is usually taken and should be faster if all of the tables are correctly indexed and the sizes of the rows are "standard". Also take into account that in the second approach the latency of the network communication will be far worse since you will need multiple trips to the DB.

fos.alex
  • 5,317
  • 4
  • 16
  • 18
  • The latency is an issue ofcourse only in the case of not using a stored procedure. Additionally, he did mention that the specific tables are quite busy meaning there may be a point to reduce overhead. However, we both agree he needs to profile both approaches! – Menelaos Oct 23 '15 at 13:03