2

I'm trying to override GetHashCode to ensure uniqueness, since i use the instances as keys in a dictionary:

IDictionary<Base, int> _counts = new Dictionary<Base,int>();

The two classes I have an issue with are:

class sealed First : Base
{
    public MyEnum1 value;
    public ExtrasEnum extras;

    public override int GetHashCode()
    {
        unchecked
        {
            return ((int)value* 397) ^ (int)extras;
        }   
    }

    //Other stuff
}

class sealed Second : Base
{
    public MyEnum2 value;
    public ExtrasEnum extras;

    public override int GetHashCode()
    {
        unchecked
        {
            return ((int)value* 397) ^ (int)extras;
        }            
    }

    //Other stuff
}

However. The issue is that when the value and extras int values become the same, then the hash codes will be equal. The calculation was the recommended one by Resharper. How do i ensure that the hashcodes for theese classes does not be come the same? Just mix it up a little with another prime number, or?

EDIT: Just to explain. I need that if to instances of First has the same value and extras values, then these two instances must be considered the same, but if an instance of First, and a instance of Second have the same int values of value and extras, then these must not be considered the same.

I'm not looking into performance, but just to ensure that same class instances are equal, and different class instances are different.

Anders
  • 1,590
  • 1
  • 20
  • 37
  • Am I missed something? If both value and extras the same hashes MUST be the same as well. This why hash is used. You want to separate First from Second? Then you could introduce some class constant, set it to unique value in every class and involve it in hash calculations. – Tommi Apr 18 '15 at 11:13
  • Yes, of course, however the `value` field of each class is a different enum type. I was just wondering if that should not be taken into consideration in the hashcode, and if so, how? – Anders Apr 18 '15 at 11:14
  • I think casting enum to int eliminates any difference. – Tommi Apr 18 '15 at 11:15
  • use some bits to differentiate enum types – Orifjon Apr 18 '15 at 11:15
  • @Tommi: How would you recommend to implement that unique class constraint? – Anders Apr 18 '15 at 11:15
  • 1
    If you want uniqueness for adding to dictionary, Hashcodes dont have to be unique. Interesting SO question [here](http://stackoverflow.com/questions/3809820/hash-code-in-dictionarytkey-tvalue) – Orifjon Apr 18 '15 at 11:26
  • Do make sure that you **do not** compute hash codes using mutable fields. Both of your `value` and `extras` values are publicly mutable. If these values change the hash code changes and that will break any structure that uses hashes for look-ups - like `Dictionary` for example. – Enigmativity Apr 18 '15 at 12:29
  • @Enigmativity This sadly is more a "in a perfect world" :-) – xanatos Apr 18 '15 at 14:01
  • @xanatos - seriously? It's a design constraint that must be followed if you want your code to operate within the BCL constraints. – Enigmativity Apr 18 '15 at 23:02
  • @Enigmativity: no worries. Nothing is changed after the instance is taken in use/added. I have added a Freezable pattern to the classes, to ensure this. – Anders Apr 19 '15 at 05:52
  • @Enigmativity Quotes... Give me a quote of this. Haven't ever read it. Unless you are speaking of *If these values change the hash code changes and that will break any structure that uses hashes for look-ups - like Dictionary* then you are absolutely right. I was speaking of *Do make sure that you do not compute hash codes using mutable fields.* – xanatos Apr 19 '15 at 06:16
  • @xanatos - Do have a read of http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/. – Enigmativity Apr 19 '15 at 09:34
  • @Enigmativity Perhaps it is you that should read it... **Guideline** *: the integer returned by GetHashCode should never change* , **Ideally**, *the hash code of a mutable object should be computed from only fields which cannot mutate, and therefore the hash value of an object is the same for its entire lifetime.*, **Rule**: *the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable*. So as I said. – xanatos Apr 19 '15 at 09:38
  • @xanatos - I'm not following you. The text you've quoted is exactly what I wanted you to read. – Enigmativity Apr 19 '15 at 09:39
  • You wrote *Do make sure that you do not compute hash codes using mutable fields.* That is Lippert's guideline. The rule is *If these values change the hash code changes and that will break any structure that uses hashes for look-ups - like Dictionary for example*. So the suggestion should be *Do make sure that GethashCode doesn't change while the item is in the Dictionary*. You integrated the guideline in the rule, making the whole a rule – xanatos Apr 19 '15 at 09:44

2 Answers2

3

It isn't very difficult to generate a perfect hash from enum members. With the assumption that they won't have more than 256 members, you can write a fast one with:

public override int GetHashCode() {
    return ((int)value << 8) ^ (int)extras; 
}

And not generate any collisions at all by writing Second.GetHashCode() as:

public override int GetHashCode() {
    return ((int)value << 16) ^ (int)extras; 
}

Very simple and perfect, but doesn't scale of course when you add more derived classes. It really doesn't need to, you are micro-optimizing without having any insight in how this really speeds up your code. Do keep in mind that a perfect hash does not avoid bucket collisions in the dictionary, the bucket index is calculated by taking the modulo of the hash code with a prime. The larger the number of items in the dictionary, the larger the prime.

Just don't do this at all. And always use a profiler if you want to know if you need to anyway.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
2

I assume that you think hash codes must not collide. This is clearly impossible to ensure in general. The following implementation of GetHashCode is always valid: return 0;. (It's just slow but not incorrect.)

The way to go about this is to keep this hash code computation of yours (since its fine) but to also override Equals. There you can differentiate between the two types. For example by saying:

if (a.GetType() != b.GetType()) return false;

In case I misuderstood your concerns a literal answer to your question would be to factor in the type of the class:

oldHashCode ^ this.GetType().GetHashCode();

(This does not ensure uniqueness, either.)

usr
  • 168,620
  • 35
  • 240
  • 369
  • Arh... So if the hashcodes are the same, then the `Equals` method will be invoked by the dictionary, to ensure uniqueness? Awesome.. This will be acceptable, since i already have the Equals methods overridden as well, and the collision described only happens rarely. – Anders Apr 18 '15 at 11:25
  • Yes. I mean how do you thing the string type implements GetHashCode? There are infinitely many strings. They can't all have a unique hash code. The dict applies the modulo operator the hash code anyway. If the capacity is 13 elements than only 13 different hash codes get into play anyway. – usr Apr 18 '15 at 11:27