9

I'm trying to skip duplicates for the list of values below. The result I'm trying to achieve is -112.94487230674, -49.47838592529, -89.9999574979198, I'm using the HashSet class. I've implemented the IEqualityComparer below but it's not working.

What am I doing wrong?

    class HeightEqualityComparer : IEqualityComparer<double>
    {
        public bool Equals(double a, double b)
        {
            return a - b < 1e-3;
        }

        public int GetHashCode(double value)
        {
            return value.GetHashCode();
        }
    }

Here is the list of values:

    [0] -112.94487230674    double
    [1] -112.94487230674001 double
    [2] -49.478385925290006 double
    [3] -49.47838592529     double
    [4] -49.478385925289992 double
    [5] -89.9999574979198   double
    [6] -89.99995749791978  double
    [7] -49.478385925289984 double
    [8] -89.999957497919809 double
    [9] -49.478385925290013 double
abenci
  • 8,422
  • 19
  • 69
  • 134
  • 2
    The rules for `GetHashCode` mean that for two values to even be checked with `Equals`, they must return exactly the same hash code value. Does your implementation ensure that's true for approximately equal values? – Damien_The_Unbeliever Jun 30 '20 at 07:15
  • 1
    You have to change `GetHashCode` as well – Dmitry Bychenko Jun 30 '20 at 07:15
  • 2
    Besides the other hints, you also need to use `Math.Abs`, else if B is bigger than A you will discard correct values unless they are all sorted. – Gusman Jun 30 '20 at 07:17
  • Try absolute value : return Math.Abs(a - b) < 1e-3; The Hash is only the first level of checking and has to match. So if you make the GetHashCode return 1 all the time then code will work. – jdweng Jun 30 '20 at 07:18
  • 1
    This is going to be monumentally difficult to achieve without lots and lots of bordercases that simply won't work. I would take a step back and consider what else you can do to achieve your real goal if such a hashset is not possible. – Lasse V. Karlsen Jun 30 '20 at 07:20
  • 3
    In fact, having commented on an answer, I've come to realise there's a problem with trying to do this by differences between pairs. Your equals method isn't transitive. `Equals(0,9e-4)` will return true, `Equals(9e-4,1.8e-3)` will also, but `Equals(0,1.8e-3)` will return false. – Damien_The_Unbeliever Jun 30 '20 at 07:20
  • Also, why should that particular -49 number be returned? From what I can see from the list, it is the second -49.xxx number provided, why would the hashset switch to using that one instead? – Lasse V. Karlsen Jun 30 '20 at 07:21
  • 1
    To follow up on @Damien_The_Unbeliever 's comment, if a==b and b==c but a!=c because of the lack of transitivity, would be it more correct to return b, or a and c? Specifically, **why** would either be more correct than the other is what you need to think about. – Lasse V. Karlsen Jun 30 '20 at 07:23

3 Answers3

6

Well, the current implementation of Equals

   return a - b < 1e-3;

is incorrect one. Equals must be

  1. Equals(a, a) == true;
  2. Symmetric: Equals(a, b) == Equals(b, a);
  3. Transitive Equals(a, b) && Equals(b, c) leads to Equals(a, c);

Conditions 2 and 3 are not hold in the current implementation. It's easy to repair the second condition with a help of Math.Abs; the third one is real difficulty: for arbitrary positive tolerance (which is 1e-3 in your case) we have

   a == a + tolerance == a + 2 * tolerance == ... == a + n * tolerance  

which means

   a == a + n * tolerance

for abitrary large n; thus a == b for all a and b (all numbers are equal).

As a partial (but valid) solution you can try rounding the values:

   class HeightEqualityComparer : IEqualityComparer<double>
   {
       public bool Equals(double a, double b)
       {
           return Math.Round(a, 3) == Math.Round(b, 3);
       }

       public int GetHashCode(double value)
       {
           return Math.Round(value, 3).GetHashCode();
       }
   } 

Note, that we have to change GetHashCode

Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
2

You'd need to round your values in GetHashCode to the same precision you are eliminating in the equality.

cineam mispelt
  • 393
  • 1
  • 8
1

Check John Skeets' answer here: How does HashSet compare elements for equality?

When you add an element to the set, it will find the hash code using IEqualityComparer.GetHashCode, and store both the hash code and the element (after checking whether the element is already in the set, of course).

In your code, all the values return different hash codes. You need to make sure that close values return the same.

Athanasios Kataras
  • 25,191
  • 4
  • 32
  • 61