How can I filter an array based a hash of arrays while considering each value unique?

Question

In a project of mine, I'm trying to filter newly gathered information that also contains all the data from the previous request. With this filtered data, I'd like to add it to the old data as a new array. New data comes in as an array, and the old data is kept stored in a hash of arrays.

I've tried a number of different methods to remove all past data points from the current data unsuccessfully. An important detail here is that the new data may contain duplicate values that match older ones, but are technically new and should be treated as unique.

Here's an example data set:

x = {
  'a' => [],
  'b' => [1],
  'c' => [],
  'd' => [2, 3, 1, 5, 6, 3]
}
y = [0, 2, 3, 5, 1, 5, 6, 3, 1, 10, 7]

z = [0, 5, 10, 7]

x is the old data and y is the new data. The desired output of the filtering would be z that would then be added to x giving us:

x = {
  'a' => [],
  'b' => [1],
  'c' => [],
  'd' => [2, 3, 1, 5, 6, 3]
  'e' => [0, 5, 10, 7]
}

I would need to continue repeating this for a bit based on some other criteria.

The main hurdle here is getting the filtering done correctly and has been proving difficult for me. Here's a list of some of the things I've tried:

I've tried iterating across the hash's keys and then simply subtracting the arrays, but that doesn't work properly as it gets rid of duplicates too, unfortunately.

irb(main):024:0> d = [2, 3, 1, 5, 6, 3]
=> [2, 3, 1, 5, 6, 3]
irb(main):025:0> y = [0, 2, 3, 5, 1, 5, 6, 3, 1, 10, 7]
=> [0, 2, 3, 5, 1, 5, 6, 3, 1, 10, 7]
irb(main):026:0> y - d
=> [0, 10, 7]

I've tried unions

irb(main):029:0> y | d
=> [0, 2, 3, 5, 1, 6, 10, 7]

and intersections. (which are definitely wrong)

irb(main):030:0> y & d
=> [2, 3, 5, 1, 6]

I tried (unsuccessfully) implementing the following from the second comment here

class Array
  def delete_elements_in(ary)
    ary.each do |x|
      if index = index(x)
        delete_at(index)
      end
  end
end

I've also tried reject!

irb(main):057:0> x = { 'a' => [], 'b' => [1], 'c' => [], 'd' => [2, 3, 1, 5, 6, 3] }
=> {"a"=>[], "b"=>[1], "c"=>[], "d"=>[2, 3, 1, 5, 6, 3]}
irb(main):058:0> y = [0, 2, 3, 5, 1, 5, 6, 3, 1, 10, 7]
=> [0, 2, 3, 5, 1, 5, 6, 3, 1, 10, 7]
irb(main):059:0> x.each_key { |key| y.reject! { |v| a[key].index(v) } }
=> {"a"=>[], "b"=>[1], "c"=>[], "d"=>[2, 3, 1, 5, 6, 3]}
irb(main):060:0> y
=> [0, 10, 7]

A more recent attempt I tried creating a new array from all of x's values and then using that against y, also unsuccessfully. I had just recently thought of trying to keep an array of 'seen' numbers, but I'm still stuck for items that actually need to be removed even though duplicate.

Throughout all this, I've been unable to get [0, 5, 10, 7] as a result.

Halp!

moveson · Answer 1 · 2018-01-13T02:16:31.147

Here's something that might work for you:

>> existing = x.values.flatten
#> [1, 2, 3, 1, 5, 6, 3]
>> z = y.dup # This avoids altering the original `y` array
>> existing.each { |e| z.delete_at(z.index(e)) if z.index(e) }
>> z
#> [0, 5, 10, 7] # z now contains the desired result

>> x['e'] = z
>> pp x
{"a"=>[],
 "b"=>[1],
 "c"=>[],
 "d"=>[2, 3, 1, 5, 6, 3],
 "e"=>[0, 5, 10, 7]}

Here's the whole thing in a single method:

def unique_array_filter(hash, new_array)
  existing = hash.values.flatten
  next_key = hash.keys.max.next
  temp = new_array.dup

  existing.each { |e| temp.delete_at(temp.index(e)) if temp.index(e) }

  hash[next_key] = temp
  hash
end

>> unique_array_filter(x, y)
#> {"a"=>[], "b"=>[1], "c"=>[], "d"=>[2, 3, 1, 5, 6, 3], "e"=>[0, 5, 10, 7]}

Thank you for your time and input! I see that I was pretty close! — metropolis, Jan 13 '18 at 17:48

Cary Swoveland · Accepted Answer · 2018-01-13T08:00:40.680

1

x.merge(x.keys.max.next => y.difference(x.values.flatten))
  #=> {"a"=>[], "b"=>[1], "c"=>[], "d"=>[2, 3, 1, 5, 6, 3], "e"=>[0, 5, 10, 7]}

where Array#difference is defined as follows.

class Array
  def difference(other)
    h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
    reject { |e| h[e] > 0 && h[e] -= 1 }
  end
end

See the link for an explanation of Array#difference.

edited Jan 13 '18 at 08:00

answered Jan 13 '18 at 07:55

Cary Swoveland

106,649
6
63
100

Thank you for your time and your comment! I've implemented your solution but have used `merge!` instead so that I can use arbitrary names rather than `x.keys.max.next`. – metropolis Jan 13 '18 at 17:47
metropolis, thank you for the suggested edit, but I think it's necessary for the coding of `difference` above to be exactly as it appears at the link. – Cary Swoveland Jan 13 '18 at 19:17
That's fair enough! – metropolis Jan 19 '18 at 22:53

How can I filter an array based a hash of arrays while considering each value unique?

2 Answers2