8

I know that when you utilize includes and you specify a where clause on the joined table, you should use .references

example:

# will error out or throw deprecation warning in logs
users = User.includes(:orders).where("Orders.cost < ?", 20)

In rails 4 or later, you will get an error like the following:

Mysql2::Error: Unknown column 'Orders.cost' in 'where clause': SELECT customers.* FROM customers WHERE (Orders.cost < 100)

Or you will get a deprecation warning:

DEPRECATION WARNING: It looks like you are eager loading table(s) (one of: users, addresses) that are referenced in a string SQL snippet. For example:

Post.includes(:comments).where("comments.title = 'foo'") Currently, Active Record recognizes the table in the string, and knows to JOIN the comments table to the query, rather than loading comments in a separate query. However, doing this without writing a full-blown SQL parser is inherently flawed. Since we don't want to write an SQL parser, we are removing this functionality. From now on, you must explicitly tell Active Record when you are referencing a table from a string:

Post.includes(:comments).where("comments.title = 'foo'").references(:comments)

If you don't rely on implicit join references you can disable the feature entirely by setting config.active_record.disable_implicit_join_references = true. (

SELECT "users"."id" AS t0_r0, "users"."name" AS t0_r1, "users"."email" AS t0_r2, "users"."created_at" AS t0_r3, "users"."updated_at" AS t0_r4, "addresses"."id" AS t1_r0, "addresses"."user_id" AS t1_r1, "addresses"."country" AS t1_r2, "addresses"."street" AS t1_r3, "addresses"."postal_code" AS t1_r4, "addresses"."city" AS t1_r5, "addresses"."created_at" AS t1_r6, "addresses"."updated_at" AS t1_r7 FROM "users" LEFT OUTER JOIN "addresses" ON "addresses"."user_id" = "users"."id" WHERE (addresses.country = 'Poland')

so we do this:

# added .references(:orders)
users = User.includes(:orders).where("Orders.cost < ?", 20).references(:orders)

And it executes just fine:

SELECT "users"."id"     AS t0_r0, 
  "users"."name"        AS t0_r1, 
  "users"."created_at"  AS t0_r2, 
  "users"."updated_at"  AS t0_r3, 
  "orders"."id"         AS t1_r0, 
  "orders"."cost"       AS t1_r1, 
  "orders"."user_id"    AS t1_r2, 
  "orders"."created_at" AS t1_r3, 
  "orders"."updated_at" AS t1_r4 
FROM "users" 
LEFT OUTER JOIN "orders" 
ON "orders"."user_id" = "users"."id" 
WHERE  ( orders.cost < 20 ) 

I know that .includes is just a wrapper for two methods: eager_load and preload. I know that since my query above is doing a filter on a joined table (orders in this example), includes is smart and knows to pick the eager_load implementation over preload because preload cannot handle doing this query since preload does not join tables.

Here is where I am confused. Ok: So on that query above: under the hood includes will utilize the eager_load implementation. But notice how when I explicitly use eager_load for this same query (which is what includes is essentially doing): I do not need to use .references! It runs the query and loads the data just fine. No error and no deprecation warning:

# did not specify .references(:orders), and yet no error and no deprecation warning 
users = User.eager_load(:orders).where("Orders.cost < ?", 20)

And it executes the exact same process with no problem:

SELECT "users"."id"     AS t0_r0, 
  "users"."name"        AS t0_r1, 
  "users"."created_at"  AS t0_r2, 
  "users"."updated_at"  AS t0_r3, 
  "orders"."id"         AS t1_r0, 
  "orders"."cost"       AS t1_r1, 
  "orders"."user_id"    AS t1_r2, 
  "orders"."created_at" AS t1_r3, 
  "orders"."updated_at" AS t1_r4 
FROM "users" 
LEFT OUTER JOIN "orders" 
ON "orders"."user_id" = "users"."id" 
WHERE  ( orders.cost < 20 ) 

That seems odd. Why does .references need to be specified for the includes version of the query, whereas .references does not need to be specified for the eager_load version of the query? What am I missing here?

Neil
  • 4,578
  • 14
  • 70
  • 155

1 Answers1

19

It comes down to the problem they mention in the deprecation warning:

Currently, Active Record recognizes the table in the string, and knows to JOIN the comments table to the query, rather than loading comments in a separate query. However, doing this without writing a full-blown SQL parser is inherently flawed. Since we don't want to write an SQL parser, we are removing this functionality.

In older versions, Rails tried to be helpful about selecting the query pattern to use, and includes would use the preload strategy if it could, but switch to the eager_load strategy when it looks like you're referencing something in a joined table. But without a full SQL parser figuring out what tables are actually referenced, it's like parsing XHTML with a Regex - you can get some things done, but Rails can't decide correctly in every case. Consider:

User.includes(:orders).where("Orders.cost < 20")

This is a nice, simple example, and Rails could tell that you need Orders joined. Now try this one:

User.includes(:orders).where("id IN (select user_id from Orders where Orders.cost < 20)")

This gives the same result, but the subquery rendered joining Orders unnecessary. It's a contrived example, and I don't know whether Rails would decide the second query needed to join or not, but the point is there are cases when the heuristic could make the wrong decision. In those cases, either Rails would perform an unnecessary join, burning memory and slowing the query down, or not perform a necessary join, causing an error.

Rather than maintain a heuristic with a pretty bad failure case, the developers decided to just ask the programmer whether the join is needed. You're able to get it right more often than Rails can (hopefully), and when you get it wrong, it's clear what to change.

Instead of adding references you could switch to eager_load, but keeping includes and references separate allows the implementation flexibility in its query pattern. You could conceivably .includes(:orders, :addresses).references(:orders) and have addresses loaded in a second preload-style query because it's not needed during the join (though Rails actually just includes addresses in the join anyway). You don't need to specify references when you're using eager_load because eager_load always joins, where preload always does multiple queries. All references does is instruct includes to use the necessary eager_load strategy and specify which tables are needed.

Community
  • 1
  • 1
Kristján
  • 18,165
  • 5
  • 50
  • 62