0

I'm wondering, when looking through a set of rows and for each row fetching another tables set of rows. For example, looping through a series of categories and for each category, fetching all their news articles. Perhaps to display on a single page or such. It seems like a lot of SELECT queries - 1 to get all categories, and one for each category (to get it's articles). So, my question is - is it quicker to simple do two fetches at the start:

categories = Category.all
articles = Articles.all

...and then just use select() or where() on articles by category id to only take those from the articles array? Replacing multiple SELECT queries with multiple array functions, which is quicker? I imagine also that each app, depending on number of rows, may vary. I would be interested to hear what people think, or any links that clarify this as I ddin't find much on the matter myself.

My code example above is Ruby on Rails but this question might actually apply to any given language. I also use PHP from time to time.

Martyn
  • 6,031
  • 12
  • 55
  • 121
  • 1
    I'd just time the results of the two fetches. In general, you should know how your app behaves specifically at what point you start bottlenecking on DB access, querying, etc... The strategy you provide above will start having problems when you get into larger data sets (ie imagine having 10k or 10M articles). – timpone Mar 19 '14 at 02:01

2 Answers2

0

It depends on what you want to do with your data. You could try eager loading.

categories = Category.includes(:articles)

Here's the documentation. http://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations

freddyrangel
  • 1,353
  • 1
  • 11
  • 20
0

I think you're describing what's called the N+1 problem (I'm new to this too). Here's another stack overflow question that addresses this issue generally: What is SELECT N+1?

n+1 is the worst, especially when you think about 10k or 10M articles like timpone pointed out. For 10M articles you'll be hitting the DB 10,000,001 times for a single request (hence the name n + 1 problem). Avoid this. Always. Anything is better than this.

If Category has a has_many relation to Article (and Article has a belongs_to relation to Category) you could use #includes to "pre-fetch" the association like so:

categories = Category.includes(:articles)

This will two two queries, one for the Category and one for the Article. You can write it out in two explicit select/where statements but I think doing it this way is semantically clearer. If you want to retrieve all the categories and then for each category get all the articles. You can write code like this (in Ruby):

categories.articles.each do |article|
  # do stuff...
end

and it's immediately clear that you mean "all the articles for this category instance".

Community
  • 1
  • 1