As I understand, you are given two arrays of hashes and a set of keys. You want to reject all elements (hashes) of the first array whose values match the values of any element (hash) of the second array, for all specified keys. You can do that as follows.
Code
require 'set'
def reject_partial_dups(array1, array2, keys)
set2 = array2.each_with_object(Set.new) do |h,s|
s << h.values_at(*keys) if (keys-h.keys).empty?
end
array1.reject do |h|
(keys-h.keys).empty? && set2.include?(h.values_at(*keys))
end
end
The line:
(keys-h.keys).empty? && set2.include?(h.values_at(*keys))
can be simplified to:
set2.include?(h.values_at(*keys))
if none of the values of keys in the elements (hashes) of array1
are nil
. I created a set (rather than an array) from array2
in order to speed the lookup of h.values_at(*keys)
in that line.
Example
keys = [:a, :b]
array1 = [{a: '1', b:'2', c:'3'}, {a: '4', b: '5', c:'6'}, {a: 1, c: 4}]
array2 = [{a: '1', b:'2', c:'10'}, {a: '3', b: '5', c:'6'}]
reject_partial_dups(array1, array2, keys)
#=> [{:a=>"4", :b=>"5", :c=>"6"}, {:a=>1, :c=>4}]
Explanation
First create set2
e0 = array2.each_with_object(Set.new)
#=> #<Enumerator: [{:a=>"1", :b=>"2", :c=>"10"}, {:a=>"3", :b=>"5", :c=>"6"}]
# #:each_with_object(#<Set: {}>)>
Pass the first element of e0
and perform the block calculation.
h,s = e0.next
#=> [{:a=>"1", :b=>"2", :c=>"10"}, #<Set: {}>]
h #=> {:a=>"1", :b=>"2", :c=>"10"}
s #=> #<Set: {}>
(keys-h.keys).empty?
#=> ([:a,:b]-[:a,:b,:c]).empty? => [].empty? => true
so compute:
s << h.values_at(*keys)
#=> s << {:a=>"1", :b=>"2", :c=>"10"}.values_at(*[:a,:b] }
#=> s << ["1","2"] => #<Set: {["1", "2"]}>
Pass the second (last) element of e0
to the block:
h,s = e0.next
#=> [{:a=>"3", :b=>"5", :c=>"6"}, #<Set: {["1", "2"]}>]
(keys-h.keys).empty?
#=> true
so compute:
s << h.values_at(*keys)
#=> #<Set: {["1", "2"], ["3", "5"]}>
set2
#=> #<Set: {["1", "2"], ["3", "5"]}>
Reject elements from array1
We now iterate through array1
, rejecting elements for which the block evaluates to true
.
e1 = array1.reject
#=> #<Enumerator: [{:a=>"1", :b=>"2", :c=>"3"},
# {:a=>"4", :b=>"5", :c=>"6"}, {:a=>1, :c=>4}]:reject>
The first element of e1
is passed to the block:
h = e1.next
#=> {:a=>"1", :b=>"2", :c=>"3"}
a = (keys-h.keys).empty?
#=> ([:a,:b]-[:a,:b,:c]).empty? => true
b = set2.include?(h.values_at(*keys))
#=> set2.include?(["1","2"] => true
a && b
#=> true
so the first element of e1
is rejected. Next:
h = e1.next
#=> {:a=>"4", :b=>"5", :c=>"6"}
a = (keys-h.keys).empty?
#=> true
b = set2.include?(h.values_at(*keys))
#=> set2.include?(["4","5"] => false
a && b
#=> false
so the second element of e1
is not rejected. Lastly:
h = e1.next
#=> {:a=>1, :c=>4}
a = (keys-h.keys).empty?
#=> ([:a,:c]-[:a,:b]).empty? => [:c].empty? => false
so return true (meaning the last element of e1
is not rejected), as there is no need to compute:
b = set2.include?(h.values_at(*keys))