1

I'm trying to implement a custom class including the comparable mixin. Later the class is used to do a diff. The class looks like

class Element
  include Comparable
  attr_reader :name

  def initialize(name)
    @name = name
  end

  def <=>(other)
    @name <=> other.name
  end
end

Now some test values with two added entries compared to a.

a = Array.new
b = Array.new

a.push Element.new('y1')
a.push Element.new('y2')
a.push Element.new('y4')
a.push Element.new('x1')
a.push Element.new('x2')
a.push Element.new('x4')

b.push Element.new('y1')
b.push Element.new('y2')
b.push Element.new('y3')
b.push Element.new('y4')
b.push Element.new('x1')
b.push Element.new('x2')
b.push Element.new('x3')
b.push Element.new('x4')

And now run diff between both arrays

puts Diff::LCS.diff(a, b).inspect

I would expect two added objects now but it finds 8 changes... Any ideas why?

[[["-", 2, #<Element:0x007fa9fb567898 @name="y4">],
  ["-", 3, #<Element:0x007fa9fbc69830 @name="x1">],
  ["-", 4, #<Element:0x007fa9fbca3378 @name="x2">],
  ["+", 2, #<Element:0x007fa9fb5e5e78 @name="y3">],
  ["+", 3, #<Element:0x007fa9fb606920 @name="y4">],
  ["+", 4, #<Element:0x007fa9fb625848 @name="x1">],
  ["+", 5, #<Element:0x007fa9fb647da8 @name="x2">],
  ["+", 6, #<Element:0x007fa9fbde8670 @name="x3">]]]

If we run the test with

a = Array.new
b = Array.new

a.push 'y1'
a.push 'y2'
a.push 'y4'
a.push 'x1'
a.push 'x2'
a.push 'x4'

b.push 'y1'
b.push 'y2'
b.push 'y3'
b.push 'y4'
b.push 'x1'
b.push 'x2'
b.push 'x3'
b.push 'x4'

Everything works as expected:

[[["+", 2, "y3"]],
 [["+", 6, "x3"]]]
Patrick Oscity
  • 53,604
  • 17
  • 144
  • 168
Paddi91
  • 93
  • 5
  • @muistooshort added the output. It's really strange, I can't see why it does not work. The result is however correct, it's just not the minimal possible diff that describes the changes. – Patrick Oscity Mar 06 '14 at 09:39
  • What is `Diff::LCS.diff(a, b)`? Is it standard Ruby module (1.9.3/2.n)? – Darek Nędza Mar 06 '14 at 09:55
  • @DarekNędza in the example I used https://github.com/halostatue/diff-lcs but it looks like it is the same with any other diff algorithm. – Paddi91 Mar 06 '14 at 10:08

1 Answers1

2

Inspecting from source of Diff::LCS, the elements in sequences are used as hash key.

The Element class you wrote mixed the Comparable module, it got the == method there. However, it doesn't have eql? and hash method override, which is used by the Hash to determine the key comparison.

With you Element class, we have

irb(main):013:0> a = Element.new("a")
=> #<Element:0x002ae4ce402da0 @name="a">
irb(main):014:0> b = Element.new("a")
=> #<Element:0x002ae4ce409ec0 @name="a">
irb(main):015:0> a == b
=> true
irb(main):016:0> a.eql? b
=> false
irb(main):017:0> a.hash == b.hash
=> false
irb(main):018:0> h = {}
=> {}
irb(main):019:0> h[a] = 1
=> 1
irb(main):020:0> h[b]
=> nil

This affects the LCS calculation.

The fix is to add the eql? and hash method for the Element class, I think.

class Element
  include Comparable
  attr_reader :name

  def initialize(name)
    @name = name
  end

  def <=>(other)
    @name <=> other.name
  end

  def eql? other
    other.class == Element and self.name == other.name
  end

  def hash
    @name.hash ^ Element.hash
  end
end
Arie Xiao
  • 13,909
  • 3
  • 31
  • 30
  • Perfect! That's it :) Could you comment the hash function a little bit? What do we achieve by XORing the both hashes? – Paddi91 Mar 06 '14 at 14:07
  • @Paddi91 You can take a look at [What's the difference between equal?, eql?, ===, and ==?](http://stackoverflow.com/questions/7156955), [Hash table](http://en.wikipedia.org/wiki/Hash_table). Generally, if a equals b, then hash(a) equals hash(b); however, hash(a) equals hash(b) doesn't imply a equals b. – Arie Xiao Mar 06 '14 at 14:47