0

There's two arrays of hash and I want remove the 'common' elements from the two arrays, based on certain keys. For example:

array1 = [{a: '1', b:'2', c:'3'}, {a: '4', b: '5', c:'6'}]
array2 = [{a: '1', b:'2', c:'10'}, {a: '3', b: '5', c:'6'}]

and the criteria keys are a and b. So when I get the result of something like

array1-array2 (don't have to overwrite '-' if there's better approach)

it will expect to get [{a: '4', b: '5', c:'6'}] sine we were using a and b as the comparing criteria. It will wipe the second element out since the value for a is different for array1.last and array2.last.

Bruce Lin
  • 2,700
  • 6
  • 28
  • 38
  • 2
    Your question is unclear. How do you intend to get that result? What do the keys have to do with it? – engineersmnky May 27 '15 at 23:46
  • You can have the hold released if you editt to clarify. If the first paragraph of my answer is accurate, feel free to use it, modified or *verbayim*. Also, I suggest you explain why, in your example, `array1.last` was kept but `array1.first` was not. – Cary Swoveland May 28 '15 at 17:22

2 Answers2

9

As I understand, you are given two arrays of hashes and a set of keys. You want to reject all elements (hashes) of the first array whose values match the values of any element (hash) of the second array, for all specified keys. You can do that as follows.

Code

require 'set'

def reject_partial_dups(array1, array2, keys)
  set2 = array2.each_with_object(Set.new) do |h,s|
     s << h.values_at(*keys) if (keys-h.keys).empty? 
  end
  array1.reject do |h|
    (keys-h.keys).empty? && set2.include?(h.values_at(*keys))
  end
end

The line:

(keys-h.keys).empty? && set2.include?(h.values_at(*keys))

can be simplified to:

set2.include?(h.values_at(*keys))

if none of the values of keys in the elements (hashes) of array1 are nil. I created a set (rather than an array) from array2 in order to speed the lookup of h.values_at(*keys) in that line.

Example

keys = [:a, :b]
array1 = [{a: '1', b:'2', c:'3'}, {a: '4', b: '5', c:'6'}, {a: 1, c: 4}]
array2 = [{a: '1', b:'2', c:'10'}, {a: '3', b: '5', c:'6'}]
reject_partial_dups(array1, array2, keys)
  #=> [{:a=>"4", :b=>"5", :c=>"6"}, {:a=>1, :c=>4}] 

Explanation

First create set2

e0 = array2.each_with_object(Set.new)
  #=> #<Enumerator: [{:a=>"1", :b=>"2", :c=>"10"}, {:a=>"3", :b=>"5", :c=>"6"}]
  #     #:each_with_object(#<Set: {}>)> 

Pass the first element of e0 and perform the block calculation.

h,s = e0.next
  #=> [{:a=>"1", :b=>"2", :c=>"10"}, #<Set: {}>]
h #=> {:a=>"1", :b=>"2", :c=>"10"} 
s #=> #<Set: {}> 
(keys-h.keys).empty?
  #=> ([:a,:b]-[:a,:b,:c]).empty? => [].empty? => true

so compute:

s << h.values_at(*keys)
  #=> s << {:a=>"1", :b=>"2", :c=>"10"}.values_at(*[:a,:b] }
  #=> s << ["1","2"] => #<Set: {["1", "2"]}> 

Pass the second (last) element of e0 to the block:

h,s = e0.next
  #=> [{:a=>"3", :b=>"5", :c=>"6"}, #<Set: {["1", "2"]}>] 
(keys-h.keys).empty?
  #=> true

so compute:

s << h.values_at(*keys)
  #=> #<Set: {["1", "2"], ["3", "5"]}> 

set2
  #=> #<Set: {["1", "2"], ["3", "5"]}> 

Reject elements from array1

We now iterate through array1, rejecting elements for which the block evaluates to true.

e1 = array1.reject
  #=> #<Enumerator: [{:a=>"1", :b=>"2", :c=>"3"},
  #                  {:a=>"4", :b=>"5", :c=>"6"}, {:a=>1, :c=>4}]:reject> 

The first element of e1 is passed to the block:

h = e1.next
  #=> {:a=>"1", :b=>"2", :c=>"3"} 
a = (keys-h.keys).empty?
  #=> ([:a,:b]-[:a,:b,:c]).empty? => true
b = set2.include?(h.values_at(*keys))
  #=> set2.include?(["1","2"] => true
a && b
  #=> true

so the first element of e1 is rejected. Next:

 h = e1.next
   #=> {:a=>"4", :b=>"5", :c=>"6"} 
 a = (keys-h.keys).empty?
   #=> true 
 b = set2.include?(h.values_at(*keys))
   #=> set2.include?(["4","5"] => false
 a && b
   #=> false

so the second element of e1 is not rejected. Lastly:

h = e1.next
  #=> {:a=>1, :c=>4} 
a = (keys-h.keys).empty?
  #=> ([:a,:c]-[:a,:b]).empty? => [:c].empty? => false

so return true (meaning the last element of e1 is not rejected), as there is no need to compute:

 b = set2.include?(h.values_at(*keys))
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • you deserve an up one because. Damn. That's a lot to write lol – Amir Raminfar May 28 '15 at 03:49
  • 2
    Thanks, @AmirRaminfar. I realize that for > 90% of readers this is gross overkill. They will look at the code and know what's going on. I often include such gory detail because I think it's helpful to newbies to be able work through the details, particularly to help them understand how enumerators work. – Cary Swoveland May 28 '15 at 03:57
  • @CarySwoveland Thanks for the detailed answer! It's really clear and helpful. – Bruce Lin May 28 '15 at 18:30
  • @CarySwoveland, when you get a chance, can you peek over at https://stackoverflow.com/questions/45336535/ruby-show-deltas-between-2-array-of-hashes-based-on-subset-of-hash-keys and confirm my suspicion that your solution here will also solve my problem? Thanks in advance! – Kurt W Jul 31 '17 at 21:03
  • @CarySwoveland, would you be so kind to suggest how I might go about modifying this to show deltas on both sides of two similar data sets. In my data, `array1` is an array of hashes for July and `array2` an array of hashes for Aug. I'd like to be able to see what was in July that no longer exists in August (fixed), what was not in July that now exists in August (found), and what is the same. I'm using your short answer above and only focused on a subset of keys. I believe the answer above produces what exists in August that didn't in July - how might I adapt it to show all 3 "sides"?Thx! – Kurt W Aug 29 '17 at 03:32
  • @KurtW, I'll have a look at that tomorrow AM... – Cary Swoveland Aug 29 '17 at 06:35
  • @CarySwoveland, you're the best. Thank you so much. – Kurt W Aug 29 '17 at 15:52
1

So you really should try this out yourself because I am basically solving it for you.

The general approach would be:

  1. For every time in array1
  2. Check to see the same value in array2 has any keys and values with the same value
  3. If they do then, delete it

You would probably end up with something like array1.each_with_index { |h, i| h.delete_if {|k,v| array2[i].has_key?(k) && array2[i][k] == v } }

Amir Raminfar
  • 33,777
  • 7
  • 93
  • 123
  • 1
    Note that I just wrote it and it worked. But I didn't really test it. You should try it your self. It's pretty easy with `irb` – Amir Raminfar May 27 '15 at 23:47