2

I have a doubt in mind when retrieving data from database. There are two tables and master table id always inserted to other table.

I know that data can retrieve from two table by joining but want to know, if i first retrieve all my desire data from master table and then in loop (in programing language) join to other table and retrieve data, then which is efficient and why.

Avinash
  • 105
  • 1
  • 1
  • 9
  • 11
    Doing the work in the database is almost always the better ideas. Databases are designed to handle complex queries. And a simple join isn't really a very complicated query. – Gordon Linoff Jan 27 '15 at 13:44
  • what possible benefit do you imagine to do the work outside the database? – Randy Jan 27 '15 at 13:53
  • it is only two tables if i have to retrieve data from 8 tables then is it better to specific data search into other tables – Avinash Jan 27 '15 at 13:55
  • 1
    A set-based solution is almost always superior in multiple ways, not least performance. Provide a test case with table definitions and sample data, and somebody will find an SQL-only solution. Here is an example: http://stackoverflow.com/questions/10471757/insert-rows-into-multiple-tables-in-a-single-query-selecting-from-an-involved-t/10472136#10472136 – Erwin Brandstetter Jan 27 '15 at 14:00
  • This is called lazy fetching. It's fine, but it's slow. If you don't have a problem with a slow program, just go ahead. Otherwise use the database to get all needed data in just one smart query. – Frank Heikens Jan 27 '15 at 14:05
  • The second method you described implements the join itself, but with extra network usage and a whole lot of extra bad things. – Erkan Haspulat Jan 27 '15 at 14:05

3 Answers3

1

As far as efficiency goes the rule is you want to minimize the number of round trips to the database, because each trip adds a lot of time. (This may not be as big a deal if the database is on the same box as the application calling it. In the world I live in the database is never on the same box as the application.) Having your application loop means you make a trip to the database for every row in the master table, so the time your operation takes grows linearly with the number of master table rows.

Be aware that in dev or test environments you may be able to get away with inefficient queries if there isn't very much test data. In production you may see a lot more data than you tested with.

Nathan Hughes
  • 94,330
  • 19
  • 181
  • 276
0

It is more efficient to work in the database, in fewer larger queries, but unless the site or program is going to be very busy, I doubt that it'll make much difference that the loop is inside the database or outside the database. If it is a website application then looping large loops outside the database and waiting on results will take a more significant amount of time.

Martin
  • 22,212
  • 11
  • 70
  • 132
0

What you're describing is sometimes called the N+1 problem. The 1 is your first query against the master table, the N is the number of queries against your detail table.

This is almost always a big mistake for performance.*

The problem is typically associated with using an ORM. The ORM queries your database entities as though they are objects, the mistake is assume that instantiating data objects is no more costly than creating an object. But of course you can write code that does the same thing yourself, without using an ORM.

The hidden cost is that you now have code that automatically runs N queries, and N is determined by the number of matching rows in your master table. What happens when 10,000 rows match your master query? You won't get any warning before your database is expected to execute those queries at runtime.

And it may be unnecessary. What if the master query matches 10,000 rows, but you really only wanted the 27 rows for which there are detail rows (in other words an INNER JOIN).

Some people are concerned with the number of queries because of network overhead. I'm not as concerned about that. You should not have a slow network between your app and your database. If you do, then you have a bigger problem than the N+1 problem.

I'm more concerned about the overhead of running thousands of queries per second when you don't have to. The overhead is in memory and all the code needed to parse and create an SQL statement in the server process.

Just Google for "sql n+1 problem" and you'll lots of people discussing how bad this is, and how to detect it in your code, and how to solve it (spoiler: do a JOIN).

* Of course every rule has exceptions, so to answer this for your application, you'll have to do load-testing with some representative sample of data and traffic.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828