1

In the implementation of GetHashCode below, when Collection is null or empty will both result in a hash code of 0.

A colleague suggested return a random hard coded number like 19 to differentiate from a null collection. Why would I want to do this? Why would I care that a null or empty collection produces a different hash code?

public class Foo
{
    public List<int> Collection { get; set; }
    // Other properties omitted.

    public int override GetHashCode()
    {
        var hashCode = 0;
        if (this.Collection != null)
        {
            foreach (var item in this.Collection)
            {
                var itemHashCode = item == null ? 0 : item.GetHashCode();
                hashCode = ((hashCode << 5) + hashCode) ^ itemHashCode;
            }
        }

        return hashCode;
    }
}
Muhammad Rehan Saeed
  • 35,627
  • 39
  • 202
  • 311
  • 2
    "should I return a random hard coded number like 123 to differentiate from a null collection" How should this **ever** be **correctly** answered? It´s completely up to your scenario. In fact there is no single correct hashcode-implementation. that fits them all. Personally I won´t bother too much for one collision more, as long as not **all** possible values return the exact same hashcode. – MakePeaceGreatAgain Jun 12 '19 at 07:52
  • 3
    Why override `GetHashCode` in tie first place? What do you want to achieve? – Patrick Hofman Jun 12 '19 at 07:53
  • 1
    If you assign a different *meaning* to an empty collection vs. a null-reference and no collection, then yes, those two things should produce different hash codes. If you don't, then you can just use the same, usually 0 is fine. – Lasse V. Karlsen Jun 12 '19 at 07:56
  • 2
    Jon Skeet has a [good hashcode implementation for lists](https://stackoverflow.com/a/8094931/1663001), but it really depends what you are trying to do here. Are you testing for list equality or list sequence equality? – DavidG Jun 12 '19 at 07:57
  • 1
    My two cents is that you should never have a null-reference to a collection anyway, because usually you don't treat that differently from an empty one. – Lasse V. Karlsen Jun 12 '19 at 07:57
  • However, nobody here can answer this question for you, the only person that can answer this question is you, the rest you will get is opinions. – Lasse V. Karlsen Jun 12 '19 at 07:58
  • Reworded the question to be a bit clearer. I'm less worried about the implementation, I care more about the **why**. Why should I care to differentiate `null` from an empty collection in a hash code? – Muhammad Rehan Saeed Jun 12 '19 at 08:13
  • I doubt you **should**. However it **may** make sense if a null-collection is semantically different from an empty one in your scenario. If both share the same semnatics there surely is no need to handle them differently, is it? In doubt I won´t bother for that, in fact you could even return the exact same hashcode for every possible value. However this may lead to performance-issues when using dictionaries or hashsets. – MakePeaceGreatAgain Jun 12 '19 at 08:15
  • 19 is not better than 0. There is more substantial trouble, this is not the kind of object that is ever a good candidate for a Dictionary key or HashSet element. The kind of collection objects where GetHashCode() actually get used. The correct implementation is `throw new NotImplementedException();` – Hans Passant Jun 12 '19 at 08:17
  • Btw.: you probably shouldn´t introduce any collection into a hashcode at all, as it will very likely change at some point. And members within hashcodes should never change, at least not while the instances are stored in a ditctionary. – MakePeaceGreatAgain Jun 12 '19 at 08:21
  • 1
    @HansPassant If the collection is *large* it makes for a bad dictionary key. Collections of items with single-digit numbers of items in them can make for fine dictionary keys. – Servy Jun 12 '19 at 15:50
  • 1
    It is not the only problem, bigger issue is that it should not be so easily mutable. Accidentally call the list object setter or modify a list item and you can't find the object back anymore. Very hard to debug. The exception keeps everybody honest and outcomes predictable. – Hans Passant Jun 12 '19 at 15:59
  • @HansPassant If it's convenient to change, yes, it's better if it's immutable, but it's not exactly difficult to not mutate an object while it is currently a dictionary key. It's perhaps a problem if you're exposing it in a library potentially used by people not familiar with how hash based collections work, it's likely to be fine when used in a particular application in which all programmers working on that application know how hash based collections work and that they shouldn't mutate a key. Just having the program not work at all (in a clear way) isn't always an option. – Servy Jun 12 '19 at 17:08

1 Answers1

2

The design of GetHashCode is that it is supposed to minimize the number of collisions that will take place, as best as it can. While having some hash collisions is inevitable, you'll want to be mindful of what types of objects are colliding, what type of data are going to be stored in your hash based collections, and working to ensure that types of objects stored together in the same collection are less likely to collide.

So if you happen to know something about how hash-based collections of this type are going to be used, and that there are likely to be both null and empty objects in them, then it would improve the performance to have them not collide. If you suspect that having both a null and empty value in the same collection is not particularly likely, then having them collide isn't actually a concern.

Servy
  • 202,030
  • 26
  • 332
  • 449