1

Currently, my code reads like this:

current_user.association.includes(a: [:b, {c: :d}, {e: :f}]).to_a

When doing a call, it seems every single includes is called through its own SELECT call to the DB.

However, when I do current_user.association.eager_load(a: [:b, {c: :d}, {e: :f}]).to_a I see one huge SELECT call.

I ask because I haven't seen this raised before. I would assume that the eager_load is more efficient due to less DB calls.

David
  • 7,028
  • 10
  • 48
  • 95
  • I would suggest either asking this in chat. Efficiency of _working code_ is a little off topic for stack overflow. – onebree Oct 08 '15 at 17:54

2 Answers2

4

As I can't infer the query from your description (a: [:b, {c: :d}, {e: :f}]), I need to talk about includes for a little bit.

includes is a query method which accommodates in different situations.

Here are some example code:

# model and reference
class Blog < ActiveRecord::Base
  has_many :posts

  # t.string   "name"
  # t.string   "author"
end

class Post < ActiveRecord::Base
  belongs_to :blog

  # t.string   "title"
end

# seed
(1..3).each do |b_id|
  blog = Blog.create(name: "Blog #{b_id}", author: 'someone')
  (1..5).each { |p_id| blog.posts.create(title: "Post #{b_id}-#{p_id}") }
end

In one case, it fires two separate queries, just like preload.

> Blog.includes(:posts)
  Blog Load (2.8ms)  SELECT "blogs".* FROM "blogs"
  Post Load (0.7ms)  SELECT "posts".* FROM "posts" WHERE "posts"."blog_id" IN (1, 2, 3)

In another case, when querying on the referenced table, it fires only one LEFT OUTER JOIN query, just like eager_load.

> Blog.includes(:posts).where(posts: {title: 'Post 1-1'})
  SQL (0.3ms)  SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id" WHERE "posts"."title" = ?  [["title", "Post 1-1"]]

So, I think you may asking for the different part of includes and eager_load, which is

Should we use two separate queries or one LEFT OUTER JOIN query for the sake of efficiency and optimisation?

This also confuses me. After some digging, I've found this article by Fabio Akita convinced me. Here are some references and example:

For some situations, the monster outer join becomes slower than many smaller queries. The bottom line is: generally it seems better to split a monster join into smaller ones. This avoid the cartesian product overload problem.

The longer and more complex the result set, the more this matters because the more objects Rails would have to deal with. Allocating and deallocating several hundreds or thousands of small duplicated objects is never a good deal.

Example for query data from Rails

> Blog.eager_load(:posts).map(&:name).count
  SQL (0.9ms)  SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id"
 => 3

Example for SQL data returned from LEFT OUTER JOIN query

sqlite>  SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id";
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|1|Post 1-1|2015-11-11 15:22:35.053689|2015-11-11 15:22:35.053689|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|2|Post 1-2|2015-11-11 15:22:35.058113|2015-11-11 15:22:35.058113|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|3|Post 1-3|2015-11-11 15:22:35.062776|2015-11-11 15:22:35.062776|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|4|Post 1-4|2015-11-11 15:22:35.065994|2015-11-11 15:22:35.065994|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|5|Post 1-5|2015-11-11 15:22:35.069632|2015-11-11 15:22:35.069632|1
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|6|Post 2-1|2015-11-11 15:22:35.078644|2015-11-11 15:22:35.078644|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|7|Post 2-2|2015-11-11 15:22:35.081845|2015-11-11 15:22:35.081845|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|8|Post 2-3|2015-11-11 15:22:35.084888|2015-11-11 15:22:35.084888|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|9|Post 2-4|2015-11-11 15:22:35.087778|2015-11-11 15:22:35.087778|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|10|Post 2-5|2015-11-11 15:22:35.090781|2015-11-11 15:22:35.090781|2
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|11|Post 3-1|2015-11-11 15:22:35.097479|2015-11-11 15:22:35.097479|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|12|Post 3-2|2015-11-11 15:22:35.103512|2015-11-11 15:22:35.103512|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|13|Post 3-3|2015-11-11 15:22:35.108775|2015-11-11 15:22:35.108775|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|14|Post 3-4|2015-11-11 15:22:35.112654|2015-11-11 15:22:35.112654|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|15|Post 3-5|2015-11-11 15:22:35.117601|2015-11-11 15:22:35.117601|3

We got the expected result from Rails, but bigger result from SQL. And that's the efficiency lose for the LEFT OUTER JOIN.

So my conclusion is, prefer includes over eager_load.


I've concluded a blog post about Preload, Eager_load, Includes, References, and Joins in Rails while researching. Hope this can help.

Reference

Community
  • 1
  • 1
ifyouseewendy
  • 6,674
  • 1
  • 21
  • 26
  • Yeah, I came to the conclusion as well that includes is definitely preferred. eager_load should only be used in certain cases, and I would recommend testing to make sure you should use eager_load where you think you need it. I performed my own tests on the matter and eager_load takes much longer than includes in many cases. As shown in the other answer, for 4,000 records, a complicated query could take 36 times as long with eager_load versus includes. – David Nov 11 '15 at 20:48
1

So, as it turns out, at one point ActiveRecord actually attempted to get everything into one query, but then opted it wasn't such a good idea.

I explored this with my query above and 4000 records.

A quick analysis:

eager_load took 2,600 milliseconds. includes took 72 milliseconds.

eager_load took 36 times as long as includes.

David
  • 7,028
  • 10
  • 48
  • 95