1

I need to generate a series of "defaults" for unknown object types with unique hash codes. (Could be structs or a class with overridden GetHashCode such that it's default construction always returns the same hash.)

To handle most such cases, I use reflection to find numeric fields / properties that I can sequence. So for example, if I can find a double field, I can just keep returning new objects with different values for that field that will generally change the hash.

So how to form that value sequence? At first I thought I just needed to stick to the range of integer resolution and could just keep adding 1.0 each iteration. Based on this answer, I choose a range of -1E50 to 1E50. But it turns out Double.GetHashCode returns the same hash for -1E50 and (-1E50 + 1.0). Reducing the range way down to -1E15 to 1E15 seems to fix it, but I still don't know if what I'm doing is safe. (I don't need anywhere near that big a range, but I do need to know I won't get collisions.)

So my question is, what range of double values can I use (incrementing by 1.0 or some other amount each time) to guarantee different hash results? Does this answer change for float or other numeric types?

Curtis
  • 85
  • 5
  • 1
    [FYI: Why are floating point numbers inaccurate?](https://stackoverflow.com/questions/21895756/why-are-floating-point-numbers-inaccurate) - You'll note that [`-1E50` and `-1E50 + 1` are the same value](https://rextester.com/SJZ57762). – ProgrammingLlama Oct 15 '20 at 02:46
  • If a class has int a; string s; data members - dont we do (in GetHashCode override) something like - return a.GetHashCode() ^ s.GetHashCode(); -- whats wrong with that? – Prateek Shrivastava Oct 15 '20 at 02:46
  • 1
    @Prateek OP is missing the point that `-1E50` and `-1E50 + 1` become the same value. – ProgrammingLlama Oct 15 '20 at 02:47
  • 1
    No its not safe, doubles cant hold that sort of precision. – TheGeneral Oct 15 '20 at 02:51
  • https://dotnetfiddle.net/fCxNAp – TheGeneral Oct 15 '20 at 02:51
  • You would be better off changing a *bit/s* strategically. however I feel this is all pointing back to a flawed design somewhere else – TheGeneral Oct 15 '20 at 02:52
  • 3
    Note that you should use the value returned by `GetHashCode` for _potential equality_. Because you're pigeonholing bigger values, there's always going to be room for collisions. You need to check the actual value to confirm equality. – ProgrammingLlama Oct 15 '20 at 02:57

1 Answers1

3

Hash values are expected to collide, so trying to predict if some undocumented implementation of GetHashCode will have particular properties is a bad idea. If you need particular properties/guarantees from a hash function - implement one yourself (with built in types you'd likely have to wrap them to another struct first).

More specific to you case

  • GetHashCode for double (or float) is generally not useful method as floating point calculations are imprecise and using value of expression to lookup in a dictionary indexed by floating point numbers pretty much guaranteed to fail all the time.
  • the particular range you started from (1E50) does not have enough precision to represent valued differing by 1, indeed smaller range would be able to represent all integers - you can see from MSDN-floating numbers that for double you get precision "15-17 digits". So yes -1E15 to 1E15 will have all integer values present with unique floating numbers.

If you want a "floating point" type that has reliable and guaranteed unique hash code for integer values you can have custom struct that returns rounded value as hash:

  struct MyDouble
  { 
      private double innerValue;
   
      // add implicit conversion to/from double

      public int override GetHashCode() { return (int)innerValue;}
      public bool oeverride Equal(object right) { return innerValue == (double)right;}
  }

Note: equal need to be implemented much better for real class, as well as all other comparisons need to be added.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179