2

In a project of mine, I'm trying to filter newly gathered information that also contains all the data from the previous request. With this filtered data, I'd like to add it to the old data as a new array. New data comes in as an array, and the old data is kept stored in a hash of arrays.

I've tried a number of different methods to remove all past data points from the current data unsuccessfully. An important detail here is that the new data may contain duplicate values that match older ones, but are technically new and should be treated as unique.

Here's an example data set:

x = {
  'a' => [],
  'b' => [1],
  'c' => [],
  'd' => [2, 3, 1, 5, 6, 3]
}
y = [0, 2, 3, 5, 1, 5, 6, 3, 1, 10, 7]

z = [0, 5, 10, 7]

x is the old data and y is the new data. The desired output of the filtering would be z that would then be added to x giving us:

x = {
  'a' => [],
  'b' => [1],
  'c' => [],
  'd' => [2, 3, 1, 5, 6, 3]
  'e' => [0, 5, 10, 7]
}

I would need to continue repeating this for a bit based on some other criteria.

The main hurdle here is getting the filtering done correctly and has been proving difficult for me. Here's a list of some of the things I've tried:

I've tried iterating across the hash's keys and then simply subtracting the arrays, but that doesn't work properly as it gets rid of duplicates too, unfortunately.

irb(main):024:0> d = [2, 3, 1, 5, 6, 3]
=> [2, 3, 1, 5, 6, 3]
irb(main):025:0> y = [0, 2, 3, 5, 1, 5, 6, 3, 1, 10, 7]
=> [0, 2, 3, 5, 1, 5, 6, 3, 1, 10, 7]
irb(main):026:0> y - d
=> [0, 10, 7]

I've tried unions

irb(main):029:0> y | d
=> [0, 2, 3, 5, 1, 6, 10, 7]

and intersections. (which are definitely wrong)

irb(main):030:0> y & d
=> [2, 3, 5, 1, 6]

I tried (unsuccessfully) implementing the following from the second comment here

class Array
  def delete_elements_in(ary)
    ary.each do |x|
      if index = index(x)
        delete_at(index)
      end
  end
end

I've also tried reject!

irb(main):057:0> x = { 'a' => [], 'b' => [1], 'c' => [], 'd' => [2, 3, 1, 5, 6, 3] }
=> {"a"=>[], "b"=>[1], "c"=>[], "d"=>[2, 3, 1, 5, 6, 3]}
irb(main):058:0> y = [0, 2, 3, 5, 1, 5, 6, 3, 1, 10, 7]
=> [0, 2, 3, 5, 1, 5, 6, 3, 1, 10, 7]
irb(main):059:0> x.each_key { |key| y.reject! { |v| a[key].index(v) } }
=> {"a"=>[], "b"=>[1], "c"=>[], "d"=>[2, 3, 1, 5, 6, 3]}
irb(main):060:0> y
=> [0, 10, 7]

A more recent attempt I tried creating a new array from all of x's values and then using that against y, also unsuccessfully. I had just recently thought of trying to keep an array of 'seen' numbers, but I'm still stuck for items that actually need to be removed even though duplicate.

Throughout all this, I've been unable to get [0, 5, 10, 7] as a result.

Halp!

metropolis
  • 35
  • 3

2 Answers2

3

Here's something that might work for you:

>> existing = x.values.flatten
#> [1, 2, 3, 1, 5, 6, 3]
>> z = y.dup # This avoids altering the original `y` array
>> existing.each { |e| z.delete_at(z.index(e)) if z.index(e) }
>> z
#> [0, 5, 10, 7] # z now contains the desired result

>> x['e'] = z
>> pp x
{"a"=>[],
 "b"=>[1],
 "c"=>[],
 "d"=>[2, 3, 1, 5, 6, 3],
 "e"=>[0, 5, 10, 7]}

Here's the whole thing in a single method:

def unique_array_filter(hash, new_array)
  existing = hash.values.flatten
  next_key = hash.keys.max.next
  temp = new_array.dup

  existing.each { |e| temp.delete_at(temp.index(e)) if temp.index(e) }

  hash[next_key] = temp
  hash
end

>> unique_array_filter(x, y)
#> {"a"=>[], "b"=>[1], "c"=>[], "d"=>[2, 3, 1, 5, 6, 3], "e"=>[0, 5, 10, 7]}
moveson
  • 5,103
  • 1
  • 15
  • 32
1
x.merge(x.keys.max.next => y.difference(x.values.flatten))
  #=> {"a"=>[], "b"=>[1], "c"=>[], "d"=>[2, 3, 1, 5, 6, 3], "e"=>[0, 5, 10, 7]}

where Array#difference is defined as follows.

class Array
  def difference(other)
    h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
    reject { |e| h[e] > 0 && h[e] -= 1 }
  end
end

See the link for an explanation of Array#difference.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100