How to "split and group" an array of objects based on one of their properties

Question

Context and Code Examples

I have an Array with instances of a class called TimesheetEntry.

Here is the constructor for TimesheetEntry:

  def initialize(parameters = {})
    @date       = parameters.fetch(:date)
    @project_id = parameters.fetch(:project_id)
    @article_id = parameters.fetch(:article_id)
    @hours      = parameters.fetch(:hours)
    @comment    = parameters.fetch(:comment)
  end

I create an array of TimesheetEntry objects with data from a .csv file:

  timesheet_entries = []
  CSV.parse(source_file, csv_parse_options).each do |row|
    timesheet_entries.push(TimesheetEntry.new(
      :date       => Date.parse(row['Date']),
      :project_id => row['Project'].to_i,
      :article_id => row['Article'].to_i,
      :hours      => row['Hours'].gsub(',', '.').to_f,
      :comment    => row['Comment'].to_s.empty? ? "N/A" : row['Comment']
    ))
  end

I also have a Set of Hash containing two elements, created like this:

  all_timesheets = Set.new []
  timesheet_entries.each do |entry|
    all_timesheets << { 'date' => entry.date, 'entries' => [] }
  end

Now, I want to populate the Array inside of that Hash with TimesheetEntries. Each Hash array must contain only TimesheetEntries of one specific date.

I have done that like this:

  timesheet_entries.each do |entry|
    all_timesheets.each do |timesheet|
      if entry.date == timesheet['date']
        timesheet['entries'].push entry
      end
    end
  end

While this approach gets the job done, it's not very efficient (I'm fairly new to this).

Question

What would be a more efficient way of achieving the same end result? In essence, I want to "split" the Array of TimesheetEntry objects, "grouping" objects with the same date.

score 3 · Accepted Answer · edited May 23 '17 at 12:11

You can fix the performance problem by replacing the Set with a Hash, which is a dictionary-like data structure.

This means that your inner loop all_timesheets.each do |timesheet| ... if entry.date ... will simply be replaced by a more efficient hash lookup: all_timesheets[entry.date].

Also, there's no need to create the keys in advance and then populate the date groups. These can both be done in one go:

all_timesheets = {}

timesheet_entries.each do |entry|
  all_timesheets[entry.date] ||= []  # create the key if it's not already there
  all_timesheets[entry.date] << entry
end

A nice thing about hashes is that you can customize their behavior when a non-existing key is encountered. You can use the constructor that takes a block to specify what happens in this case. Let's tell our hash to automatically add new keys and initialize them with an empty array. This allows us to drop the all_timesheets[entry.date] ||= [] line from the above code:

all_timesheets = Hash.new { |hash, key| hash[key] = [] }

timesheet_entries.each do |entry|
  all_timesheets[entry.date] << entry
end

There is, however, an even more concise way of achieving this grouping, using the Enumerable#group_by method:

all_timesheets = timesheet_entries.group_by { |e| e.date }

And, of course, there's a way to make this even more concise, using yet another trick:

all_timesheets = timesheet_entries.group_by(&:date)

Thanks! Learned several new things from your answer. I always thought `Set` was quicker than `Hash` or `Array` for filtering out unique items, since they disregard duplicates automatically when you try to add them; I'll need to dig into that. Also, I did not know about `||=` or `group_by`. — leifericf, Jan 16 '15 at 18:32
I replaced nearly all of the above code with this one line as you suggested: `all_timesheets = timesheet_entries.group_by(&:date)` and it's much faster, too. Man, I love Ruby. Thanks again for the pointers. — leifericf, Jan 16 '15 at 18:46

How to "split and group" an array of objects based on one of their properties

1 Answers1