8

I'm having an inner debate at my company about looping queries in this matter:

$sql = "
  SELECT foreign_key
  FROM t1";

foreach(fetchAll($sql) as $row)
{
  $sub_sql = "
    SELECT *
    FROM t2
    WHERE t2.id = " . $row['foreign_key'];

  foreach(fetchAll($sub_sql) as $sub_row)
  {
    // ...
  }
}

Instead of using an sql join like this:

$sql = "
  SELECT t2.*
  FROM t2
  JOIN t1
  ON t1.foreign_key = t2.id";

foreach(fetchAll($sql) as $row)
{
  // ...
}

Additional information about this, the database is huge, millions of rows.

I have of course searched an answer to this question, but nobody can answer this in a a good way and with a lot of up votes that makes me certain that one way is better then the other.

Question

Can somebody explain to me why one of thees methods is better then the other one?

superhero
  • 6,281
  • 11
  • 59
  • 91
  • Depends on the case. There is no single answer whether multiple selects are faster than joins. You will have to test for yourself which one is faster. Also check: http://stackoverflow.com/questions/1067016/join-queries-vs-multiple-queries – Anonymoose Aug 23 '13 at 14:11

2 Answers2

11

The join method is generally considered better, if only because it reduces the overhead of sending queries back and forth to the database.

If you have appropriate indexes on the tables, then the underlying performance of the two methods will be similar. That is, both methods will use appropriate indexes to fetch the results.

From a database perspective, the join method is far superior. It consolidates the data logic in one place, making the code more transparent. It also allows the database to make optimizations that might not be apparent in application code.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I have to say that I generally disagree, not because your wrong but because I also program in PHP. My data is often related by 3+ relationships. Joining that data to get one result back makes it a lot harder and harder to read when you have to spit out html elements per the relationships. i.e. knowledge of relationships is lost or harder to glean once you do a join, and that is often needed after the query... but joins will usually be faster. – gunslingor Apr 12 '17 at 15:37
  • @gunslingor and Gordon, I just got an issue recently that db tables are huge without permission to add index for that column, in this case, is it using loop more efficient than query on that un-indexed column? – tim Dec 11 '18 at 02:58
  • @tim . . . The database should still be faster than a loop, because data only needs to go into/out of the database once. Both will be really slow, though. – Gordon Linoff Dec 11 '18 at 03:25
8

Because of driver overhead, a loop is far less efficient

This is similar to another question I answered, but different enough not to cv. My full answer is here but I'll summarize the main points:

Whenever you make a connection to a database, there are three steps taken:

  1. A connection to the database is established.
  2. A query, or multiple queries, to the database is executed.
  3. Data is returned for processing.

Using a loop structure, you will end up generating additional overhead with driver requests, where you will have a request and a return per loop cycle rather than a single request and single return. Even if the looped queries do not take any longer than the single large query (this is very unlikely as MySQL internals have a lot of shortcuts built in to prevent using a full repetitive loop), you will still find that the single query is faster on driver overhead.

Using a loop without TRANSACTIONS, you will also find that you run into relational data integrity issues where other operations affect the data you're iterating between loop cycles. Using transactions, again, increases overhead because the database has to maintain two persistent states.

Community
  • 1
  • 1
Glitch Desire
  • 14,632
  • 7
  • 43
  • 55