-1

I have an array of hashes coming from a dynamo table that I need to group by a key and sum the values of another key. My array looks similar to:

data = [
  { 'state' => 'Florida', 'minutes_of_sun' => 10, 'timestamp' => 1497531600, 'region' => 'Southeast' },
  { 'state' => 'Florida', 'minutes_of_sun' => 7, 'timestamp' => 1497531600, 'region' => 'Southeast' },
  { 'state' => 'Florida', 'minutes_of_sun' => 2, 'timestamp' => 1497531600, 'region' => 'Southeast' },
  { 'state' => 'Georgia', 'minutes_of_sun' => 15, 'timestamp' => 1497531600, 'region' => 'Southeast' },
  { 'state' => 'Georgia', 'minutes_of_sun' => 5, 'timestamp' => 1497531600, 'region' => 'Southeast' }
]

The end result that I would be looking for is:

data = [
  { 'state' => 'Florida', 'minutes_of_sun' => 19, 'region' => 'Southeast' },
  { 'state' => 'Georgia', 'minutes_of_sun' => 20, 'region' => 'Southeast' }
]

I've been able to do this via a method I wrote below, but it's slow and clunky. Was wondering if there is a faster/less LoC way to do this?

def combine_data(data)
  combined_data = []

  data.each do |row|
    existing_data = combined_data.find { |key| key['state'] == row['state'] }
    if existing_data.present?
      existing_data['minutes_of_sun'] += row['minutes_of_sun']
    else
      combined_data << row
    end
  end

  combined_data
end
PamB
  • 23
  • 5
  • `data.group_by { |h| h['state'] }.values.map { |hs| hs.inject { |a, b| a.merge(b) { |key, oldval, newval| oldval + newval } } }` – falsetru Jun 15 '17 at 14:49
  • `data.group_by { |h| h['state'] }.values.map { |hs| hs.inject { |a, b| a['minutes_of_sun'] += b['minutes_of_sun']; a } }` (if you don't mind your original `data` hash modified) – falsetru Jun 15 '17 at 14:51
  • Possible duplicate of [Sum values in array of hash if they have the same value](https://stackoverflow.com/questions/43876712/sum-values-in-array-of-hash-if-they-have-the-same-value) This question has been asked and answered multiple times (did you try searching first?) just googling brought me tens of results without looking too hard – engineersmnky Jun 15 '17 at 15:06

2 Answers2

1

Try this one

data.group_by { |item| item['state'] }.values.map do |arr| 
  h = arr.first
  h.delete('timestamp')
  h.merge('minutes_of_sun' => arr.inject { |acc, h| acc + h['minutes_of_sun'] }) 
end
 => [{"state"=>"Florida", "minutes_of_sun"=>19, "region"=>"Southeast"}, {"state"=>"Georgia", "minutes_of_sun"=>20, "region"=>"Southeast"}]

from ruby 2.4.0

data.group_by { |item| item['state'] }.values.map do |arr| 
  h = arr.first
  h.delete('timestamp')
  h.merge('minutes_of_sun' => arr.sum { |item| item['minutes_of_sun'] }) 
end
 => [{"state"=>"Florida", "minutes_of_sun"=>19, "region"=>"Southeast"}, {"state"=>"Georgia", "minutes_of_sun"=>20, "region"=>"Southeast"}]
Ursus
  • 29,643
  • 3
  • 33
  • 50
0

You can use the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are present in both hashes being merged. See the doc for an explanation of the three block variables in that block.

data = [
  { 'state'=>'Florida', 'sun_min'=>10, 'stamp'=>149, 'region'=>'SE' },
  { 'state'=>'Georgia', 'sun_min'=>15, 'stamp'=>149, 'region'=>'SE' },
  { 'state'=>'Georgia', 'sun_min'=> 5, 'stamp'=>149, 'region'=>'SE' }
]

data.each_with_object({}) do |g,h|
  h.update(g['state']=>g.reject { |k,_| k=='stamp' }) do |_,o,n|
    o.merge('sun_min'=>o['sun_min']+n['sun_min'])
  end
end.values
  #=> [{"state"=>"Florida", "sun_min"=>10, "region"=>"SE"},
  #    {"state"=>"Georgia", "sun_min"=>20, "region"=>"SE"}]

Note that without .values this returns

#=> {"Florida"=>{"state"=>"Florida", "sun_min"=>10, "region"=>"SE"},
#    "Georgia"=>{"state"=>"Georgia", "sun_min"=>20, "region"=>"SE"}}
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100