Merge duplicates in array of hashes

Question

I have an array of hashes in ruby:

[
  {name: 'one', tags: 'xxx'},
  {name: 'two', tags: 'yyy'},
  {name: 'one', tags: 'zzz'},
]

and i'm looking for any clean ruby solution, which will make it able to simply merge all the duplicates in that array (by merging i mean concatinating the tags param) so the above example will be transformed to:

[
  {name: 'one', tags: 'xxx, zzz'},
  {name: 'two', tags: 'yyy'},
]

I can iterate through each array element, check if there is a duplicate, merge it with the original entry and delete the duplicate but i feel that there can be a better solution for this and that there are some caveats in such approach i don't know about. Thanks for any clue.

You might find [`group_by`](http://ruby-doc.org/core-2.1.2/Enumerable.html#method-i-group_by) helpful. — Zach Kemp, Jun 15 '14 at 19:56

score 7 · Accepted Answer · answered Jun 15 '14 at 19:57

7

I can think of as

arr = [
  {name: 'one', tags: 'xxx'},
  {name: 'two', tags: 'yyy'},
  {name: 'one', tags: 'zzz'},
]

merged_array_hash = arr.group_by { |h1| h1[:name] }.map do |k,v|
  { :name => k, :tags =>  v.map { |h2| h2[:tags] }.join(" ,") } 
end

merged_array_hash
# => [{:name=>"one", :tags=>"xxx ,zzz"}, {:name=>"two", :tags=>"yyy"}]

answered Jun 15 '14 at 19:57

Arup Rakshit

116,827
30
260
317

@Ven No.. To make it work, you need to create some proc object, then you can use it... – Arup Rakshit Jun 15 '14 at 20:00
@Ven look into the highly voted [answer](http://stackoverflow.com/questions/23695653/can-you-supply-arguments-to-the-mapmethod-syntax-in-ruby/23711606#23711606) to understand what I meant. – Arup Rakshit Jun 15 '14 at 20:06

Cary Swoveland · Answer 2 · 2014-06-20T12:40:16.367

Here's a way that makes use of the form of Hash#update (aka Hash.merge!) that takes a block for determining the merged value for every key that is present in both of the two hashes being merged.

Code

def combine(a)
  a.each_with_object({}) { |g,h| h.update({ g[:name]=>g }) { |k,hv,gv|
           { name: k, tags: hv[:tags]+", "+gv[:tags] } } }.values  
end

Example

a = [{name: 'one', tags: 'uuu'},
     {name: 'two', tags: 'vvv'},
     {name: 'one', tags: 'www'},
     {name: 'six', tags: 'xxx'},
     {name: 'one', tags: 'yyy'},
     {name: 'two', tags: 'zzz'}]

combine(a)
  #=> [{:name=>"one", :tags=>"uuu, www, yyy"},
  #    {:name=>"two", :tags=>"vvv, zzz"     },
  #    {:name=>"six", :tags=>"xxx"          }]

Explanation

Suppose

a = [{name: 'one', tags: 'uuu'},
     {name: 'two', tags: 'vvv'},
     {name: 'one', tags: 'www'}]

b = a.each_with_object({})
  #=> #<Enumerator: [{:name=>"one", :tags=>"uuu"},
  #                  {:name=>"two", :tags=>"vvv"},
  #                  {:name=>"one", :tags=>"www"}]:each_with_object({})>

We can convert the enumerator b to an array to see what values it will pass into its block:

b.to_a
  #=> [[{:name=>"one", :tags=>"uuu"}, {}],
  #    [{:name=>"two", :tags=>"vvv"}, {}],
  #    [{:name=>"one", :tags=>"www"}, {}]]

The first value passed to the block and assigned to the block variables is:

g,h = [{:name=>"one", :tags=>"uuu"}, {}]
g #=> {:name=>"one", :tags=>"uuu"}
h #=> {}

The first merge operation is now performed (the merged h is returned):

h.update({ g[:name] => g })
  #=> h.update({ "one" => {:name=>"one", :tags=>"uuu"} })
  #=> {"one"=>{:name=>"one", :tags=>"uuu"}}

h does not have the key "one", so update's block is not involed.

Next, the enumerator b passes the following into the block:

g #=> {:name=>"two", :tags=>"vvv"}
h #=> {"one"=>{:name=>"one", :tags=>"uuu"}}

so we execute:

h.update({ g[:name] => g })
  #=> h.update({ "two"=>{:name=>"two", :tags=>"vvv"})
  #=> {"one"=>{:name=>"one", :tags=>"uuu"},
  #    "two"=>{:name=>"two", :tags=>"vvv"}}

Again, h does not have the key "two", so the block is not used.

Lastly, each_with_object passes the final tuple into the block:

g #=> {:name=>"one", :tags=>"www"}
h #=> {"one"=>{:name=>"one", :tags=>"uuu"},
  #    "two"=>{:name=>"two", :tags=>"vvv"}}

and we execute:

h.update({ g[:name] => g })
  #=> h.update({ "one"=>{:name=>"one", :tags=>"www"})

h has a key/value pair with key "one":

"one"=>{:name=>"one", :tags=>"uuu"}

update's block is therefore executed to determine the merged value. The following values are passed to that block's variables:

k #=> "one"
hv #=> {:name=>"one", :tags=>"uuu"} <h's value for "one">
gv #=> {:name=>"one", :tags=>"www"} <g's value for "one">

and the block calculation creates this hash (as the merged value for the key "one"):

{ name: k, tags: hv[:tags]+", "+gv[:tags] }
  #=> { name: "one", tags: "uuu" + ", " + "www" }
  #=> { name: "one", tags: "uuu, www" }

So the merged hash now becomes:

h #=> {"one"=>{:name=>"one", :tags=>"uuu, www"},
  #    "two"=>{:name=>"two", :tags=>"vvv"     }}

All that remains is to extract the values:

h.values  
  #=> [{:name=>"one", :tags=>"uuu, www"}, {:name=>"two", :tags=>"vvv"}]

Merge duplicates in array of hashes

2 Answers2

Linked