4

Which option is better:

  1. Writing a very complex query having large number of joins, or
  2. Writing 2 queries one after the other, applying the obtained result set of the processed query on other.
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Ankur Mukherjee
  • 3,777
  • 5
  • 32
  • 39

6 Answers6

6

Generally, one query is better than two, because the optimizer has more information to work with and may be able to produce a more efficient query plan than either separately. Additionally, using two (or more) queries typically means you'll be running the second query multiple times, and the DBMS might have to generate the query plan for the query repeatedly (but not if you prepare the statement and pass the parameters as placeholders when the query is (re)executed). This means fewer back and forth exchanges between the program and the DBMS. If your DBMS is on a server on the other side of the world (or country), this can be a big factor.

Arguing against combining the two queries, you might end up shipping a lot of repetitive data between the DBMS and the application. If each of 10,000 rows in table T1 is joined with an average of 30 rows from table T2 (so there are 300,000 rows returned in total), then you might be shipping a lot of data repeatedly back to the client. If the row size of (the relevant projection of) T1 is relatively small and the data from T2 is relatively large, then this doesn't matter. If the data from T1 is large and the data from T2 is small, then this may matter; measure before deciding.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
2

When I was a junior DB person I once worked for a year in a marketing dept where I had so much free time I did each task 2 or 3 different ways. I made a habit of writing one mega-select that grabbed everything in one go and comparing it to a script that built interim tables of selected primary keys and then once I had the correct keys went and got the data values.

In almost every case the second method was faster. the cases where it wasn't were when dealing with a small number of small tables. Where it was most noticeably faster was of course large tables and multiple joins.

I got into the habit of select the required primary keys from tableA, select the required primary keys from tableB, etc. Join them and select the final set of primary keys. Use the selected primary keys to go back to the tables and get the data values.

As a DBA I now understand that this method resulted in less purging of the data cache and played nicer with others using the DB (as mentioned by Amir Raminfar).

It does however require the use of temporary tables which some places / DBA don't like (unfairly in my mind)

Karl
  • 3,312
  • 21
  • 27
1

Depends a lot on the actual query and the actual database i.e. SQL, Oracle mySQL.

Oscar Gomez
  • 18,436
  • 13
  • 85
  • 118
1

At large companies, they prefer option 2 because option 1 will hog the database cpu. This results in all other connections being slow and everything being a bottle neck. That being said, it all depends on your data and the ammount you are joining. If you are joining on 10000 to 1000 then you are going to get back 10000 x 1000 records. (Assuming an inner join)

Possible duplicate MySQL JOIN Abuse? How bad can it get?

Community
  • 1
  • 1
Amir Raminfar
  • 33,777
  • 7
  • 93
  • 123
1

Assuming "better" means "faster", you can easily test these scenarios in a junit test. Note that a determining factor that you may not be able to get from a unit test is network latency. If the database sits right next to your machine where you run the unit test, you may see no difference in performance that is attributed to the network. If your production servers are in another town, country, or continent from the database, network traffic becomes more of a bottleneck. You do not want to go back and forth across the wire- you more likely want to make one round trip and get everything at once.

Again, it all depends :)

Moe Matar
  • 2,026
  • 13
  • 7
0

It could depend on many things: ,

  • the indexes you have set up
  • how many tables,
  • what the actual query is,
  • how big the data set is,
  • what the underlying DB is,
  • what table engine you are using

The best thing to do would probably test both methods on a variety of test data and see which one bottle necks.

If you are using MySQL, ( and Oracle maybe? ) you can use

EXPLAIN SELECT ..... 

and it will give you a lot of info on how it will execute the query, and therefor how you can improve it etc.

loosecannon
  • 7,683
  • 3
  • 32
  • 43