0

Both User.ID and Group.ID are Int16 and immutable and I want to generate an optimal HashCode.

This is the equality

    public override bool Equals(Object obj)
    {
        //Check for null and compare run-time types.
        if (obj == null || GetType() != obj.GetType()) return false;
        UserGroup item = (UserGroup)obj;
        return (User.ID == item.User.ID && Group.ID == item.Group.ID);
    }

What would be an optimal GetHashCode. Right now I am using the following but only because I saw it as an example. The primary use of the Object is in a HashSet and that HashSet gets a lot of .Select(x => x.User.ID = y) or .Select(x => x.Group.ID = y).

  public override int GetHashCode() { return (int)User.ID ^ (int)Group.ID; }
paparazzo
  • 44,497
  • 23
  • 105
  • 176

2 Answers2

1

Never skip the opportunity to generate a perfect hash. Two 16-bit shorts fit in a 32-bit int:

    public override int GetHashCode() {
        return (User.ID & 0xffff) + (Group.ID << 16);
    }

The first part of the expression isolates the lower 16 bits, the & operator is necessary to make it work properly for negative values so only bits 0 through 15 are used. The second part moves the 16 bits of Group.ID to bit positions 16 through 31. Adding them combines the two sets of 16 bits to make a 32-bit number.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Don't fully understand this answer either but in testing with several values it did yield unique results. Those same values using XOR did not yield unique results so I is clearly a better hash. I posted another question an packing two 32 into a 64 and then extracting if you want to pick up a quick answer. – paparazzo Apr 01 '12 at 17:21
  • It's not just a better hash, it is a *perfect* hash :) – Hans Passant Apr 01 '12 at 17:27
0

Well, actually neither is preferred if you want good hash distribution (just one illustration: with your implementation, User.ID = 10, Group.ID = 5 will hash to the same value as User.ID = 5, Group.ID = 10). It's usable, of course, but take a look at what the master had to say about this.

Community
  • 1
  • 1
Alan
  • 6,501
  • 1
  • 28
  • 24
  • I saw that and trust Keets as the right answer but to be honest I did not understand it. – paparazzo Apr 01 '12 at 17:21
  • Well, the (vastly simplified) point is that you want to avoid different inputs to generate the same hash code (a "hash collision"), because then the performance of any hash-based algorithm or class will begin to degenerate from O(1) towards O(n). Hans quite nailed it with his answer, because in your use case it is possible to generate a hash code that is guaranteed to be unique for each unique input combination (it's a "perfect hash"): you have two 16-bit integers and the hash code is 32 bits, so it can accommodate all possible combinations of your inputs. – Alan Apr 02 '12 at 07:32