8

I'd like to use something like find_in_batches, but instead of grouping fully instantiated AR objects, I would like to group a certain attribute, like, let's say, the id. So, basically, a mixture of using find_in_batches and pluck:

Cars.where(:engine => "Turbo").pluck(:id).find_in_batches do |ids|
  puts ids
end

# [1, 2, 3....]
# ...

Is there a way to do this (maybe with Arel) without having to write the OFFSET/LIMIT logic myself or recurring to pagination gems like will paginate or kaminari?

Mischa
  • 42,876
  • 8
  • 99
  • 111
ChuckE
  • 5,610
  • 4
  • 31
  • 59

1 Answers1

2

This is not the ideal solution, but here's a method that just copy-pastes most of find_in_batches but yields a relation instead of an array of records (untested) - just monkey-patch it into Relation :

def in_batches( options = {} )
  relation = self

  unless arel.orders.blank? && arel.taken.blank?
    ActiveRecord::Base.logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
  end

  if (finder_options = options.except(:start, :batch_size)).present?
    raise "You can't specify an order, it's forced to be #{batch_order}" if options[:order].present?
    raise "You can't specify a limit, it's forced to be the batch_size"  if options[:limit].present?

    relation = apply_finder_options(finder_options)
  end

  start = options.delete(:start)
  batch_size = options.delete(:batch_size) || 1000

  relation = relation.reorder(batch_order).limit(batch_size)
  relation = start ? relation.where(table[primary_key].gteq(start)) : relation

  while ( size = relation.size ) > 0    

    yield relation

    break if size < batch_size

    primary_key_offset = relation.last.id
    if primary_key_offset
      relation = relation.where(table[primary_key].gt(primary_key_offset))
    else
      raise "Primary key not included in the custom select clause"
    end
  end
end

With this, you should be able to do :

Cars.where(:engine => "Turbo").in_batches do |relation|
  relation.pluck(:id)
end

this is not the best implementation possible (especially in regard to primary_key_offset calculation, which instantiates a record), but you get the spirit.

m_x
  • 12,357
  • 7
  • 46
  • 60
  • it was a good try, indeed. But I'd rather prefer a non-monkey-patching solution, since I want to apply the said logic in a library which can be distributed by several projects, and I'd like to be as less intrusive as possible concerning code injection in AR. – ChuckE Mar 21 '13 at 16:54
  • Good point...so make it a class method. I guess your lib will be included in models anyway, and it should work just the same as long as AR is used. – m_x Mar 22 '13 at 07:38
  • Ended up using .limit and .offset ARel methods. I'm just saying, it'd be nice to have such an out-of-the-box solution for Rails. – ChuckE Mar 22 '13 at 11:25
  • @ChuckE you should post your code as answer. I'd love to see what you came up with. – drewish Feb 28 '14 at 00:39
  • https://github.com/TiagoCardoso1983/association_observers/blob/master/lib/association_observers/orm/active_record.rb – ChuckE Feb 28 '14 at 19:44
  • There's a good answer here: https://stackoverflow.com/questions/28391320/in-rails-3-2-how-to-pluck-in-batches-for-a-very-large-table – Peter Ehrlich May 11 '23 at 13:02