C# collisions with Double.GetHashCode()

Question

I need to generate a series of "defaults" for unknown object types with unique hash codes. (Could be structs or a class with overridden GetHashCode such that it's default construction always returns the same hash.)

To handle most such cases, I use reflection to find numeric fields / properties that I can sequence. So for example, if I can find a double field, I can just keep returning new objects with different values for that field that will generally change the hash.

So how to form that value sequence? At first I thought I just needed to stick to the range of integer resolution and could just keep adding 1.0 each iteration. Based on this answer, I choose a range of -1E50 to 1E50. But it turns out Double.GetHashCode returns the same hash for -1E50 and (-1E50 + 1.0). Reducing the range way down to -1E15 to 1E15 seems to fix it, but I still don't know if what I'm doing is safe. (I don't need anywhere near that big a range, but I do need to know I won't get collisions.)

So my question is, what range of double values can I use (incrementing by 1.0 or some other amount each time) to guarantee different hash results? Does this answer change for float or other numeric types?

[FYI: Why are floating point numbers inaccurate?](https://stackoverflow.com/questions/21895756/why-are-floating-point-numbers-inaccurate) - You'll note that [`-1E50` and `-1E50 + 1` are the same value](https://rextester.com/SJZ57762). — ProgrammingLlama, Oct 15 '20 at 02:46
If a class has int a; string s; data members - dont we do (in GetHashCode override) something like - return a.GetHashCode() ^ s.GetHashCode(); -- whats wrong with that? — Prateek Shrivastava, Oct 15 '20 at 02:46
@Prateek OP is missing the point that `-1E50` and `-1E50 + 1` become the same value. — ProgrammingLlama, Oct 15 '20 at 02:47
You would be better off changing a *bit/s* strategically. however I feel this is all pointing back to a flawed design somewhere else — TheGeneral, Oct 15 '20 at 02:52
Note that you should use the value returned by `GetHashCode` for _potential equality_. Because you're pigeonholing bigger values, there's always going to be room for collisions. You need to check the actual value to confirm equality. — ProgrammingLlama, Oct 15 '20 at 02:57

score 3 · Answer 1 · answered Oct 15 '20 at 03:31

Hash values are expected to collide, so trying to predict if some undocumented implementation of GetHashCode will have particular properties is a bad idea. If you need particular properties/guarantees from a hash function - implement one yourself (with built in types you'd likely have to wrap them to another struct first).

More specific to you case

GetHashCode for double (or float) is generally not useful method as floating point calculations are imprecise and using value of expression to lookup in a dictionary indexed by floating point numbers pretty much guaranteed to fail all the time.
the particular range you started from (1E50) does not have enough precision to represent valued differing by 1, indeed smaller range would be able to represent all integers - you can see from MSDN-floating numbers that for double you get precision "15-17 digits". So yes -1E15 to 1E15 will have all integer values present with unique floating numbers.

If you want a "floating point" type that has reliable and guaranteed unique hash code for integer values you can have custom struct that returns rounded value as hash:

  struct MyDouble
  { 
      private double innerValue;
   
      // add implicit conversion to/from double

      public int override GetHashCode() { return (int)innerValue;}
      public bool oeverride Equal(object right) { return innerValue == (double)right;}
  }

Note: equal need to be implemented much better for real class, as well as all other comparisons need to be added.

C# collisions with Double.GetHashCode()

1 Answers1