12

In this MSDN article http://msdn.microsoft.com/en-us/library/ms132123.aspx it discusses the Class Equalitycomparer and has an example.In this example about comparing boxes it has this class -

class BoxSameDimensions : EqualityComparer<Box>
{
    public override bool Equals(Box b1, Box b2)
    {
        if (b1.Height == b2.Height & b1.Length == b2.Length
            & b1.Width == b2.Width)
        {
            return true;
        }
        else
        {
            return false;
        }
    }

    public override int GetHashCode(Box bx)
    {
        int hCode = bx.Height ^ bx.Length ^ bx.Width;
        return hCode.GetHashCode();
    }
}

I don't understand the line int hCode = bx.Height ^ bx.Length ^ bx.Width;

Could someone explain please? Why the xor?

Paul Turner
  • 38,949
  • 15
  • 102
  • 166
Paul Stanley
  • 1,999
  • 1
  • 22
  • 60
  • See http://stackoverflow.com/q/371328/11683 and http://stackoverflow.com/q/263400/11683. – GSerg Oct 02 '13 at 14:25
  • Also see http://stackoverflow.com/a/2334251/945456 and http://csharpindepth.com/ViewNote.aspx?NoteID=27 (the latter one explains why XOR might actually be a bad idea) – Jeff B Oct 02 '13 at 14:33

2 Answers2

12

The ^ operator is the bitwise exclusive-or operator.

In this case it's being used as a convenient way to generate a hash code from three integers. (I don't think it's a very good way, but that's a different issue...)

Weirdly, after constructing a hash code, they use GetHashCode() on it again, which is utterly pointless for an int because it will just return the int itself - so it's a no-op.

This is how they should have written it:

public override int GetHashCode(Box bx)
{
    return bx.Height ^ bx.Length ^ bx.Width;
}

This SO answer explains why XOR works quite well sometimes: Why are XOR often used in java hashCode() but another bitwise operators are used rarely?

Note: The reason I don't like using xor for a hash code for three ints like that is because:

a ^ b ^ a == b

In other words if the first and last ints contributing to the hash code are the same, they do not contribute to the final hash code at all - they cancel each other out and the result is always the middle int.

It's even worse if you are only using two ints because:

a ^ a == 0

So for two ints, for all cases where they are the same the hash code will be zero.

Community
  • 1
  • 1
Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • Yes thanks! just learning this and thought it strange! – Paul Stanley Oct 02 '13 at 14:33
  • It's also worth noting that in languages where out-of-bounds integer arithmetic is Undefined Behavior, the xor operator has the advantage of yielding defined behavior with any combination of operands, but in languages where integer types wrap cleanly, addition is just as fast and easy, and avoids the problem cases you describe. – supercat Oct 02 '13 at 14:57
  • So if we can guarantee the list of integers is distinct, can this be viewed as a good way of computing hash code? – Mike T Mar 30 '18 at 18:23
0

As you probably know the GetHashCode() is the function which should map your objects into the number such that probability of getting same number by two different objects should be as least as possible (and obviously this number should be always the same for the same object + function should be fast). From the all boolean operators (AND, OR, NOT, XOR) XOR gives best bit distribution (look at the OR, AND, XOR boolean tables). However I suggest you to check this approach: What is the best algorithm for an overridden System.Object.GetHashCode?. (hash function using prime numbers distribution property).

Community
  • 1
  • 1
fex
  • 3,488
  • 5
  • 30
  • 46