0

I am attempting to make a batch process which will take a parameter that specifies the number of background workers, and split a collection into that many arrays. For example if

def split_for_batch(number_of_workers)
  <code>
end

array =  [1,2,3,4,5,6,7,8,9,10]

array.split_for_batch(3)  

=> [[1,2,3],[4,5,6],[7,8,9,10]]

the thing is that I don't want to have to load all of the users into memory at once because it is a batch. What I have now is

def initialize_audit_run_threads
    total_users = tax_audit_run_users.count
    partition_size = (total_users / thread_count).round
    tax_audit_run_users.in_groups_of(partition_size).each do |group|
      thread = TaxAuditRunThread.create(:tax_audit_run_id => id, :status_code => 1)
      group.each do |user|
        if user
          user.tax_audit_run_thread_id = thread.id
          user.save
        end
      end
    end

where the thread_count is an attribute of the class that determines the number of background workers. Currently this code will create 4 threads rather than 3. I have also tried using find_in_batches but I am having the same problem where if I have 10 tax_audit_run_users in the array I have no way to let the last worker know to process the last record. Is there a way in ruby or rails to divide a collection into n parts and have the last part include the stragglers?

ruby_newbie
  • 3,190
  • 3
  • 18
  • 29
  • 3
    Is it necessary to mention batch, threads, etc. here? Extract the core problem that you want to ask. It looks like you are just asking a way to chunk an array in a certain way. But that is blurred because of all the extra things you wrote. It is hard to follow your question. – sawa Sep 30 '14 at 20:51
  • I was thinking that it was important to mention that aspect because I don't want to load all of the objects into memory at once. I tried to distill it down in the first part but even if I had a way to do the first section of code, I would still be unable to use it due to the constraints of the batch size – ruby_newbie Sep 30 '14 at 20:54
  • 1
    "I don't want to load all of the objects into memory at once." If you're loading data from a table, then don't retrieve every record at once. There are multiple ways to selectively return chunks of data, depending on the DBM, but Active Record should be able to abstract that away for you. – the Tin Man Sep 30 '14 at 21:18
  • care to elaborate? Sorry I am new to Ruby and I am thinking there is a way to do this but I have searched and tried several things but have not had any success. – ruby_newbie Sep 30 '14 at 21:20

1 Answers1

1

How to split (chunk) a Ruby array into parts of X elements?

You will of course need to modify it a bit to add the last chunk if it's less than the chunk size, or not, up to you.

Community
  • 1
  • 1
mattforni
  • 855
  • 5
  • 11
  • I read that post which is where I got in_groups_of from. I still can't seem to find a way to handle this without loading all of it into memory at once. – ruby_newbie Sep 30 '14 at 20:57
  • 1
    Are you loading the tax_audit_run_users from the database? With the code you posted you've already loaded all of the users into memory. If you don't want to load all of the TaxAuditRunThread objects into memory, you're not. Garbage collection should clean those up when you leave the 'group' block. Maybe you can elaborate on what you mean by **"all of it"** as that is a **really** ambiguous phrase. – mattforni Sep 30 '14 at 21:35