2

So, I have this Struct. It has some functions and operators overridden, but that's not so important.

public struct Vector3
{
    public float X, Y, Z;
}

I'm filtering a Collection of these via a HashTable and it works fine. But I want to verify that only duplicates are removed. By default my IDE gave me this code for the GetHashCode() method:

public override int GetHashCode()
{
    var hashCode = -307843816;
    hashCode = hashCode * -1521134295 + X.GetHashCode();
    hashCode = hashCode * -1521134295 + Y.GetHashCode();
    hashCode = hashCode * -1521134295 + Z.GetHashCode();
    return hashCode;
}

Which looks fishy to me. I'm pretty sure that the factors alone will generate a overflow on a INT_32. Furthermore i am confused that all three floats seem to have the same "weight" (-1521134295). Plus I am not really sure what GetHashCode() of a float does. It's bit Pattern itself should be unique already.

It has to be impossible, to create a unique 32 bit pattern from 96 bit's of input.

So how does it work?

  • Does the HashMap uses the equal Method before removing items with the same HashCode.

  • Am I just lucky?

  • And what are these Numbers my IDE generated as "weights" and startvalue?

PS: For those who are interested, the struct's code is here.

public struct Vector3
{
    public float X, Y, Z;

    public Vector3(float x, float y, float z)
    {
        X = x;
        Y = y;
        Z = z;
    }     

    public static bool operator ==(Vector3 a, Vector3 b)
    {
        return (a.X == b.X && a.Y == b.Y && a.Z == b.Z);
    }

    public static bool operator !=(Vector3 a, Vector3 b)
    {
        return (a.X != b.X || a.Y != b.Y || a.Z != b.Z);
    }

    public static Vector3 operator +(Vector3 a, Vector3 b)
    {
        return new Vector3(a.X + b.X, a.Y + b.Y, a.Z + b.Z);
    }

    public static Vector3 operator -(Vector3 a, Vector3 b)
    {
        return new Vector3(a.X - b.X, a.Y - b.Y, a.Z - b.Z);
    }

    public float Magnitued
    {
        get
        {
            return (float)Math.Sqrt((X * X) + (Y * Y) + (Z * Z));
        }
        private set { }
    }

    public Vector3 Normalized
    {
        get
        {
            var mag = Magnitued;
            return new Vector3(X / mag, Y / mag, Z / mag);
        }

        private set { }
    }

    public override bool Equals(object obj)
    {
        if (!(obj is Vector3))
        {
            return false;
        }

        var vector = (Vector3)obj;
        return X == vector.X && Y == vector.Y && Z == vector.Z;
    }

    public override int GetHashCode()
    {
        var hashCode = -307843816;
        hashCode = hashCode * -1521134295 + X.GetHashCode();
        hashCode = hashCode * -1521134295 + Y.GetHashCode();
        hashCode = hashCode * -1521134295 + Z.GetHashCode();
        return hashCode;
    }
}
Tobias Tengler
  • 6,848
  • 4
  • 20
  • 34
Griizz
  • 155
  • 1
  • 12
  • 1
    Note that hashes aren't expected to be unique due to their finite size. See [pigeonhole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle). Good hashing algorithms minimize collisions for their typical input data. `GetHashCode()` is typically used to determine if two items _might_ be the same item. Further checks are required to confirm this. – ProgrammingLlama Jan 11 '19 at 07:46
  • 1
    The overflow is usually no problem, and you may wrap the code within an `unchecked` section. Anyway hashes aren´t ment to be unique, so two non-equal numbers may have the same hashcode. However two **equal** numbers **must** have the same hashcode. – MakePeaceGreatAgain Jan 11 '19 at 07:47
  • Usually structs provide a default-implementation for equals and hashcodes which is pretty fine. It does so by comparing every member bitwise, so your implementation seems to be redundant anyway. See https://stackoverflow.com/questions/8315656/in-net-4-0-what-is-the-default-implementation-of-equals-for-value-types for further details. – MakePeaceGreatAgain Jan 11 '19 at 07:52

1 Answers1

3

I think you have a misunderstanding here. Hash codes are not supposed to be unique, because as you have observed, sometimes it's impossible

The main requirement is that objects that are considered "equal" should have the same hash code.

Causing an overflow is okay, as long as the requirement is met.

This is stated in the documentation as well:

Two objects that are equal return hash codes that are equal. However, the reverse is not true: equal hash codes do not imply object equality, because different (unequal) objects can have identical hash codes.

You can also see other implementations of GetHashCode in the linked page.

Sweeper
  • 213,210
  • 22
  • 193
  • 313