What is the best way to implement this composite GetHashCode()

Question

I have a simple class:

public class TileName {
    int Zoom, X, Y;

    public override bool Equals (object obj)
    {
        var o = obj as TileName;
        return (o != null) && (o.Zoom == Zoom) && (o.X == X) && (o.Y == Y);
    }

    public override int GetHashCode ()
    {
        return (Zoom + X + Y).GetHashCode();
    }
}

I was curious if I would get a better distribution of hash codes if I instead did something like:

    public override int GetHashCode ()
    {
        return Zoom.GetHashCode() + X.GetHashCode() + Y.GetHashCode();
    }

This class is going to be used as a Dictionary key, so I do want to make sure there is a decent distribution.

Little warning: Please make sure the fields `Zoom`, `X`, and , 'Y' can not be changed after creation of the type. The hash code of an instance must not be allowed to change, otherwise it will become impossible to find keys in your hash (I think FxCop validates this). Change the call `int Zoom, X, Y;` to `readonly int Zoom, X, Y;` to make it obvious. — Steven, Apr 29 '10 at 06:26

score 79 · Accepted Answer · edited May 23 '17 at 12:26

Like described by Jon Skeet in this SO answer, it is best practice to pick some prime numbers and multiply these with the single hash codes, then sum everything up.

public int GetHashCode()
{
    unchecked
    {
        int hash = 17;
        // Maybe nullity checks, if these are objects not primitives!
        hash = hash * 23 + Zoom.GetHashCode();
        hash = hash * 23 + X.GetHashCode();
        hash = hash * 23 + Y.GetHashCode();
        return hash;
    }
}

The problems with xor hashes are:

if X is equal to Y then your hash will be just Zoom, because then X ^ Y = X ^ X = 0 holds
xor is a symmetric operator, it will produce the exact same hashes for the objects [Zoom = 3, X = 5, Y = 7], [Zoom = 3, X = 7, Y = 5], [Zoom = 7, X = 5, Y = 3] etc.

These facts make the xor-method more likely to cause collisions.

In addition to Jons post, consider using a unchecked context, for explicitly ignoring overflows. Because like the MSDN says:

If neither checked nor unchecked is used, a constant expression uses the default overflow checking at compile time, which is checked. Otherwise, if the expression is non-constant, the run-time overflow checking depends on other factors such as compiler options and environment configuration.

So while usually overflows will be unchecked, it may be that it fails somewhen in some environment or built with some compiler option. But in this case you want to explicitly not check these overflows.

Update:

By the way: someInt.GetHashCode() returns someInt. Like this, it is of course the fastest possible and a perfect hash distribution without a single collision. How else would you map an int to an int-hash? :) So what I wanted to say: Your first approach:

return (Zoom + X + Y).GetHashCode();

and your second one:

return Zoom.GetHashCode() + X.GetHashCode() + Y.GetHashCode();

are exactly the same. You dont even have to call GetHashCode and both are very likely to have collisions. Maybe even worse than the xor method, if you very likely have small integer values for all three ints.

Update 2:

As I wrote in the comment to ChaosPandions post: If you just have those three int values, and X, Y and Zoom are relatively small numbers (smaller than 1000 or 10000) this one may be also a good hash generator:

public int GetHashCode()
{
    return (X << 16) ^ (Y << 8) ^ Zoom;
}

It just distributes the bits in the hash value (example in big-endian for readability):

00000000 00000000 00000011 00110001    X = 817
00000000 00000000 00011011 11111010    Y = 7162
00000000 00000000 00000010 10010110    Zoom = 662

00000011 00110001 00000000 00000000    X << 16
00000000 00011011 11111010 00000000    Y << 8
00000000 00000000 00000010 10010110    Zoom

00000011 00101010 11111000 10010110    (X << 16) ^ (Y << 8) ^ Zoom

score 11 · Answer 2 · edited Oct 31 '20 at 21:39

11

I know this questions is kind of old but nowadays you can easily create your hash code by using System.HashCode class

https://learn.microsoft.com/en-us/dotnet/api/system.hashcode.combine?view=netcore-3.1

In this specific case it would look like

public override int GetHashCode()
{
    return HashCode.Combine(Zoom, X, Y);
}

edited Oct 31 '20 at 21:39

suchoss

3,022
1
19
21

answered Jun 25 '20 at 14:35

Cícero Neves

401
4
13

LukeH · Answer 3 · 2010-04-28T23:39:11.513

7

Neither of the implementations in your question are ideal. For example, they'll return exactly the same hash for { Zoom=1, X=2, Y=3 }, { Zoom=2, X=3, Y=1 }, { Zoom=3, X=1, Y=2 } etc etc.

I usually use something like this:

public override int GetHashCode()
{
    // 269 and 47 are primes
    int hash = 269;
    hash = (hash * 47) + Zoom.GetHashCode();
    hash = (hash * 47) + X.GetHashCode();
    hash = (hash * 47) + Y.GetHashCode();
    return hash;
}

(From memory, I think the C# compiler uses something similar when it generates the GetHashCode methods for anonymous types.)

edited Apr 28 '10 at 23:39

answered Apr 28 '10 at 23:33

LukeH

263,068
57
365
409

@Philip: I have seen Jon mention it before, but I can't remember where I originally picked it up. I think it's a fairly common implementation. – LukeH Apr 28 '10 at 23:57
Yeah, its just a good practice, more people should accustom themselves to. – Philip Daubmeier Apr 29 '10 at 00:11

score 5 · Answer 4 · answered Apr 28 '10 at 22:32

5

I've actually found this to be really effective.

public override int GetHashCode ()
{
    return Zoom.GetHashCode() ^ X.GetHashCode() ^ Y.GetHashCode();
}

answered Apr 28 '10 at 22:32

ChaosPandion

77,506
18
119
157

While this is better than the implementations in the question, it's still not great. For example, it doesn't take the field ordering into account, so `{ Zoom=1, X=2, Y=3 }`, `{ Zoom=2, X=3, Y=1 }`, `{ Zoom=3, X=1, Y=2 }` etc etc will all result in the same hash being returned. Some sort of rolling multiplication and/or sum will avoid this (and probably give better distribution too). – LukeH Apr 28 '10 at 23:53
@Luke: agreed. @ChoasPandion: please read this here: http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode/263416#263416 – Philip Daubmeier Apr 28 '10 at 23:56
@Luke - I agree, generally I will always try use the simplest solution to any problem. For any serious application you will want to use an algorithm with a smaller chance of collision. – ChaosPandion Apr 28 '10 at 23:58
@ChaosPandion: Jon Skeets solution is just as simple _and_ with a smaller chance of collision. Its not like thats a very sofisticated large algorithm or so. If you dont care about collisions you can just `return 1;` statically for every instance. Ok, just kidding... :D – Philip Daubmeier Apr 29 '10 at 00:09
@Philip - I've used this for dictionaries of objects upwards of 500K in size without any problem. Of course the values I hashed were more complex than the example. – ChaosPandion Apr 29 '10 at 00:22
@ChaosPandion: Youre right, I used that xoring very often, too. That clearly depends on the values and their distribution. If you have millions of entries in the dictionary, and every entry has two int fields <= 50 the 'prime number multiplication' method may make the dictionary go much faster. Other applications may just go well with the xoring. – Philip Daubmeier Apr 29 '10 at 00:31
3

Just came up with a new idea, maybe a compromise between our two solutions: If you just have those three int values, and `X`, `Y` and `Zoom` are relatively small numbers (smaller than 1000 or 10000) this one may be also a good hash generator: `return (X << 16) ^ (Y << 8) ^ Zoom;` – Philip Daubmeier Apr 29 '10 at 00:36
@Philip - I like it. That certainly would resolve the situation you described. – ChaosPandion Apr 29 '10 at 00:41

score 3 · Answer 5 · answered Apr 28 '10 at 22:31

3

public override int GetHashCode ()
{
    return (Zoom.ToString() + "-" + X.ToString() + "-" + Y.ToString()).GetHashCode();
}

answered Apr 28 '10 at 22:31

James Westgate

11,306
8
61
68

This will probably give a nice distribution, but is really bad for performance, because at least one new string and one new string array is created on each call to GetHashCode. You rather have bad distribution than this. – Steven Apr 28 '10 at 22:37
@Steven, this can be cached once calculated, and clean the cached value any time Zoom, X or Y are set. – Fede Apr 29 '10 at 00:01
@Fede: you can either cache the result of a slow algorithm, or just use the fast one. And btw: caching makes only sense if you have readonly fields, or you have to store the fields old values, too. That would get messy... – Philip Daubmeier Apr 29 '10 at 00:15
@Philip: you don't need to store the old values. You can cache the result of GetHashCode in a nullable int. If the cache is null, you calculate it, if it isn't then just return that. When setting the fields that affect the cache just set the cache to null. Caching makes sense when the cost of the operation times the number of times it will be called is the cause of a bottleneck. – Fede Apr 29 '10 at 04:24
The OP did ask for decent distribution. Performance could be an issue but what size dataset are we looking at? – James Westgate Mar 16 '17 at 22:05

What is the best way to implement this composite GetHashCode()

5 Answers5

Linked