6

I am trying to optimize my query for RAM usage and a table that is in the .includes(:ocr) has too many columns on it (and also a large-sized column). Is there a way that I can still utilize Rails' .includes() but pick out explicit .select() columns from the includes() tables?

I want to exclude using an N+1 which is what a .joins() generates.

ie.

User.select(:email).includes(:ocr => select(:small_value_only))
Kamilski81
  • 14,409
  • 33
  • 108
  • 161
  • What is the current RAM usage by selecting all columns? – Raj Nov 19 '18 at 22:25
  • 350MB, the column i'm trying to remove is a text column that is sometimes 2-3MB large. – Kamilski81 Dec 03 '18 at 21:52
  • You could batch the records using one of the batch AR methods. – Raj Dec 03 '18 at 22:07
  • Which version of Rails are you using? – Old Pro Feb 02 '19 at 21:53
  • [My answer](https://stackoverflow.com/a/54498412/712765) generates a single SQL query that returns only the `email` and `small_value_only` columns. What don't you like about it. By the way, `joins` explicitly _avoids_ [the N+1 problem](https://stackoverflow.com/a/97308/712765). You probably mean something else. – Old Pro Feb 21 '19 at 23:36
  • Hey @OldPro, I'm not only trying to get single fields, I have a handful of fields that I need, thus pluck won't work. – Kamilski81 Feb 23 '19 at 16:25
  • 1
    You can pluck as many fields as you want, just list them in `pluck`. The example in your question only gets 2 fields. Please provide a better example if you want a different answer. – Old Pro Feb 23 '19 at 21:15

4 Answers4

4

You don't give much information on how your tables are set, so I'll assume that User belongs_to :ocr, User has an ocr_id column, and the ocr table's name is ocr.

You should use joins instead of includes, that's how you generate an INNER JOIN query and be able to retrieve the joined table's column(s). Finally you can alias the column name with AS, so it's easier to retrieve in your model:

users = User.joins(:ocr).select(:email, :ocr_id, '`ocr`.`small_value_only` AS `ocr_sma`')

users.each do |user|
  user.email # Users.email for this record
  user.ocr_sma # Joined Ocr.small_value_only for this record
end

(Obviously I aliased it ocr_sma, but you can give it the name you want)

Benjamin Bouchet
  • 12,971
  • 2
  • 41
  • 73
2

If you are really only trying to get columns of data rather than full ActiveRecord objects, you should use pluck instead of select. Speeding up these kinds of big queries is exactly what pluck was created for. Instead of all of the overhead that goes with creating all the ActiveRecord objects for all your data, pluck just returns an array of data:

values = User.includes(:ocr).pluck(:email, "ocr.small_value_only")
# SELECT "user"."email", ocr.small_value_only FROM "users"
#   LEFT OUTER JOIN "ocrs" ON "ocr"."user_id" = "users"."id"
#
# [["email_1", 3], ["email_2", 7]] # 3 and 7 are the small values

If you want ActiveRecord objects but want to skip filling in some of the data, then it is the same as above but replace includes with left_joins and pluck with select

values = User.left_joins(:ocr).select(:email, "ocr.small_value_only")
# SELECT "user"."email", ocr.small_value_only FROM "users"
#   LEFT OUTER JOIN "ocrs" ON "ocr"."user_id" = "users"."id"
#
# #<ActiveRecord::Relation [#<user id: 1>, #<user id: 2>]>

Either way, as you can see, the SELECT statement is the same and only the selected columns of data are returned.

Old Pro
  • 24,624
  • 7
  • 58
  • 106
0

I do not want to say that the following options are necessarily better than what Benj and Old Pro suggest, but rather want to provide some more alternatives.

The first one makes use of the includes method as that is what OP asked for. But as includes will by default select all columns of the primary model (User), one first has to remove that default selection via the except method:

User.includes(:ocr).except(:select).select(:id, 'ocr.small_value_only')

Of course, using join in the first place would be easier.

The following options stem from my dislike for having strings in AR queries as it tends to rely on knowledge about the called model which AR should actually abstract for you, i.e. the name of the database table (ocr).

A naive implementation for removing that knowledge is to get the table name from the model:

User.joins(:ocr).select(:email, "`#{Ocr.table_name}`.`small_value_only`")

The next alternative relies on merge to get rid of the string for selecting the column:

 User.joins(:ocr).merge(Ocr.select(:project_id)).select(:id) 

But as the resulting sql does not reference the Ocr table for small_value_only, it would be equivalent to writing:

 User.joins(:ocr).select(:email, :small_value_only) 

and will only work as long as small_value_only does not also exist on the users table.

The last alternative does not have that shortcoming but is way more verbose:

User.joins(:ocr).select(Ocr.select(:small_value_only).arel.projections).select(:id)
ulferts
  • 2,187
  • 12
  • 19
0

I think you should create a new association in User like:

belongs_to :ocr_small_value_only, -> {select('ocr.id,ocr.small_value_only')}, class_name: 'ocr', foreign_key: 'ocr_id'

and then query using

User.select('ocr_id, email').includes(:ocr_small_value_only)
Rahul
  • 23
  • 3