3

I'm quite new to Ruby, and was hoping to get the difference between two arrays.

I am aware of the usual method:

a = [...]
b = [...]
difference = (a-b)+(b-a)

But the problem is that this is computing the set difference, because in ruby, the statement (a-b) defines the set compliment of a, relative to b.

This means [1,2,2,3,4,5,5,5,5] - [5] = [1,2,2,3,4], because it takes out all of occurrences of 5 in the first set, not just one, behaving like a filter on the data.

I want it to remove differences only once, so for example, the difference of [1,2,2,3,4,5,5,5,5], and [5] should be [1,2,2,3,4,5,5,5], removing just one 5.

I could do this iteratively:

a = [...]
b = [...]

complimentAbyB = a.dup
complimentBbyA = b.dup

b.each do |bValue|
  complimentAbyB.delete_at(complimentAbyB.index(bValue) || complimentAbyB.length)
end
a.each do |aValue|
  complimentBbyA.delete_at(complimentBbyA.index(aValue) || complimentBbyA.length)
end

difference = complimentAbyB + complimentBbyA

But this seems awfully verbose and inefficient. I have to imagine there is a more elegant solution than this. So my question is basically, what is the most elegant way of finding the difference of two arrays, where if one array has more occurrences of a single element then the other, they will not all be removed?

rp.beltran
  • 2,764
  • 3
  • 21
  • 29

2 Answers2

7

I recently proposed that such a method, Ruby#difference, be added to Ruby's core. For your example, it would be written:

a = [1,2,2,3,4,5,5,5,5]
b = [5]

a.difference b
  #=> [1,2,2,3,4,5,5,5]

The example I've often given is:

a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]

a.difference b
  #=> [1, 3, 2, 2] 

I first suggested this method in my answer here. There you will find an explanation and links to other SO questions where I proposed use of the method.

As shown at the links, the method could be written as follows:

class Array
  def difference(other)
    h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
    reject { |e| h[e] > 0 && h[e] -= 1 }
  end
end
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • @Jesse, I've had more time to think about it. :-). Actually, I haven't given much thought to efficiency; it's possible it could be sped up. – Cary Swoveland Dec 18 '15 at 02:35
  • @Jesse, thanks for the fix. (Copy, paste, forget to change). – Cary Swoveland Dec 18 '15 at 02:42
  • This is really cool, thank you and I hope it gets put into Ruby. At the first link you shared, they asked if you could think of any practical reasons for having it implemented. If it helps, the reason why I wanted to use it is to find occurrences of triples in an array (like [1,2,2,2,3]). If an array has a triple, then the difference between array and array.unique will have a double, which is much easier to check for. – rp.beltran Dec 18 '15 at 02:42
  • If it's part of a application (and not just an exercise), you could post a brief description of the problem to that thread at Ruby Trunk. Readers might find that helpful. – Cary Swoveland Dec 18 '15 at 02:47
  • @CarySwoveland I knew you were doing to appear. – sawa Dec 18 '15 at 02:51
  • @rp.beltran That is a different question that might be answered along a different path (without using the notion you asked here). You may want to ask that on a different post. – sawa Dec 18 '15 at 02:52
  • @CarySwoveland I think your Ruby#difference is brilliant - if only ruby-lang had a voting system. – Rots Dec 18 '15 at 03:01
  • rp, I'm trying to remember who it was that asked if I could give a "real world example". The name was familiar. – Cary Swoveland Dec 18 '15 at 03:01
  • @sawa, I didn't mean to ask a separate question, I was referring to a comment that was made in the Ruby Trunk article posted where it was questioned when people would use this, so I was just giving my own situation as an example. @ CarySwoveland, alright, I'll do that. – rp.beltran Dec 18 '15 at 03:16
  • 1
    Good idea, thanks for pointing it out. I had switched to something similar, by defining the getDoubles as `array.difference(array.uniq)`, but you are correct that restructuring getTriples is a much nicer approach, and I quite like how much symmetry it keeps when moving to four of a kind. It's awesome how nicely this turns out. It seems this is a perfect use case for the function, I'm glad I asked this question instead of doing some clunky iterators like I was considering. – rp.beltran Dec 18 '15 at 07:56
3

.....

ha = a.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
hb = b.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
ha.merge(hb){|_, va, vb| (va - vb).abs}.inject([]){|a, (k, v)| a + [k] * v}

ha and hb are hashes with the element in the original array as the key and the number of occurrences as the value. The following merge puts them together and creates a hash whose value is the difference of the number of occurrences in the two arrays. inject converts that to an array that has each element repeated by the number given in the hash.


Another way:

ha = a.group_by(&:itself)
hb = b.group_by(&:itself)
ha.merge(hb){|k, va, vb| [k] * (va.length - vb.length).abs}.values.flatten
sawa
  • 165,429
  • 45
  • 277
  • 381