7

I'm working on a geometry in space project and I have different geometrical entities, among which Point. Sometimes two points are equal but for small numerical errors due to calculation, such as 1 and 1.0000000001, so I implemented the __eq__ method with math.isclose() function to sort this things out.

class Point(object):

    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

    def __eq__(self, other):
        if isinstance(other, Point):
            equal_x = math.isclose(other.x, self.x, rel_tol=5e-3, abs_tol=1e-5)
            equal_y = math.isclose(other.y, self.y, rel_tol=5e-3, abs_tol=1e-5)
            equal_z = math.isclose(other.z, self.z, rel_tol=5e-3, abs_tol=1e-5)
            if equal_x and equal_y and equal_z:
                return True

        return False   

How to implement the __hash__ method in order for such two objects to be equal when using sets and dictionaries?

The ultimate goal is to use the following function to "uniquify" a list of such object and remove duplicates:

def f12(seq):
    # from Raymond Hettinger
    # https://twitter.com/raymondh/status/944125570534621185
    return list(dict.fromkeys(seq))
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • 5
    The absolute requirement of `__hash__()` is that it return equal values for objects that `__eq__()` says are equal. So consider some object of your class named `X`, with a certain hash code. Add a little bit to each coordinate, so that they still pass the `isclose()` test: the hash *must* remain the same. Add a little bit more, the hash must still be the same, and so on. The *only* `__hash__()` implementation that is consistent with your `__eq__()` is to return a constant - which of course destroys the efficiency of sets or dicts. – jasonharper Mar 10 '20 at 16:36
  • 3
    `__eq__` should be transitive, but yours is not. It's possible for `x == y` and `y == z` without `x == z` being true. – chepner Mar 10 '20 at 16:36
  • Hi Enrico, have you read through [What's a correct and good way to implement __hash__()?](https://stackoverflow.com/questions/2909106/whats-a-correct-and-good-way-to-implement-hash) – Brydenr Mar 10 '20 at 16:37
  • Hash should be a fast-calculating method that gives same result for "same" things (in quotes because you can define it differently). And preferably different results for "different" things. - Preferably, so you can even implement hash that always returns `1`, but that wouldn't speed things up. – h4z3 Mar 10 '20 at 16:42
  • 1
    In your case consider what values do xyz take - and divide the space accordingly, e.g. 1x1x1 box in the space 10x10x10 => hash could be `100*round(x)+10*round(y)+round(z)` But as chepner said, you have other issues here and the "box" thing I mention for hashes won't work (1.9(...)95 and 2.0(...)01 being in different boxes, but being "equal") – h4z3 Mar 10 '20 at 16:42
  • That tweet makes the assumption that `x != y` implies `hash(x) != hash(y)`, which is not always the case. For example, on my machine, `hash(8) == hash(2**64)`. (That, or it intends the phrase "hashable duplicate" to mean, literally, "duplicate as defined by hash", not just "duplicates of hashable values".) – chepner Mar 10 '20 at 17:08

2 Answers2

3

There is a general problem with your equal method.

Most people will expect transitivity in a equal method. This means that when two points a and b are equal and a is also equal to an other point c, then a and c should be also equal. That doesn't have to be the case in your implementation.

Imagine each point, have a sphere around him, where the points are considered equal. The following drawing shows this spheres (or better half the radius of the sphere) and overlapping therefore implies that the points are equal:

point visualization

So a and b should have the same hash code and b and c should have the same hash code, but not a and c? How should this be possible?

I would propose to add an extra method is_close_to and implement the logic there.

Edit: @JLPeyret points out, that you can use a grid and compute the hash value of a point corresponding to the quadrant of the grid that contains the point. In this case it is possible, that two near points are close to the division of a grid quadrant and therefore assigned with different hash values. If such a probabilistic approach works for you, take a look at locality-sensitive hashing.

Querenker
  • 2,242
  • 1
  • 18
  • 29
  • What if you conceptualize this as a discrete grid and, by `rounding` of x,y,z assign each point, on its own merits- not relative to others- to a cell in the grid? Hash on the grid location. If a == b then that means they are in the same cell. b==c means c is in same cell too. – JL Peyret Mar 13 '20 at 17:11
  • @JLPeyret Thanks for the input, I have added a addition to my answer. – Querenker Mar 13 '20 at 17:34
  • 1
    Thanks a lot you all for your inputs, I'm sort of new to these things but you gave me a lot to think about! – Enrico Agostini Mar 17 '20 at 17:02
0

Instead of giving your points an 'invalid' __eq__ method, either give them an isclose method, using the code you already have, and use that instead of == or make them equal in the standard sense by rounding the coordinates.

Terry Jan Reedy
  • 18,414
  • 3
  • 40
  • 52
  • Thanks for your answer, rounding the number is one of the solutions I'm considering. I just want to make shure the code doesn't mess up! – Enrico Agostini Mar 17 '20 at 17:04