0

I have a Rails command:

Product.includes(:sale, :product_description).select('distinct on (size) *')

It is my understanding that when I run this command, Rails will eagerly load the results into memory, and since there are only 4 sizes, the result should only contain 4 records. However, if I do something like:

result = Product.includes(:sale, :product_description).select('distinct on (size) *')

and then do: result.size. The result will be a count of all the product records, which is 435. But if I do: result.length then I will get the expected response, which is 4.

Why am I getting different results with length and size? I thought that by using includes everything is eagerly loaded so length and size should return the same count?

Also note, if I run length first and then size, both return the correct result of 4. But if I run size first, then I get the wrong response of 435.

Darkisa
  • 1,899
  • 3
  • 20
  • 40

1 Answers1

2

There are subtile differences between count, size and length on ActiveRecord::Relations (see this answer about the details)

In your example length will always perform the exact query as defined on the ActiveRecord::Relation and will then call length on the returned array.

Whereas result.size will perform a count query when the ActiveRecord::Relation wasn't loaded before. Because count returns a number (and not actual records) the query is optimized and the include and the select parts are ignored. This makes sense from a performance point of view.

But if the Relation is already loaded (for example when you call length before size) then size will just return the size of the already loaded array.

spickermann
  • 100,941
  • 9
  • 101
  • 131
  • Thanks for the response. That make sense, but I thought if I used includes then the Relation should load the array since it uses eager loading, meaning that size should not ignore the include or select? Why would size ignore the includes and select parts? – Darkisa Sep 04 '18 at 17:46
  • No, because `includes` doesn't guarantee to run a SQL `JOIN` and to result in only one single query. `includes` might cause in two queries - one for the records and a second loading all associated records at once (What is still faster than N+1 queries - the problem `includes` aiming to solve. – spickermann Sep 04 '18 at 17:52
  • 1
    And `select` is ignored because `count` uses its own `SELECT COUNT(*)`. – spickermann Sep 04 '18 at 17:56
  • Got it. Thanks! – Darkisa Sep 04 '18 at 17:58