0

I am a bit disappointed by this, I've defined a class events (which are considered equal if they are close on coordinates and happen at the same date):

import datetime

class Event(object):
    def __init__(self, fixed_date, longitude, latitude):
        self.fixed_date = fixed_date
        self.longitude = longitude
        self.latitude = latitude
        self.tolerance = 0.000044915764206/5 # this is roughly one meter
    def __eq__(self, other):
        if not isinstance(other, Event):
            return False
        return self.fixed_date == other.fixed_date and abs(self.longitude - other.longitude)<self.tolerance and abs(self.latitude - other.latitude)<self.tolerance
    def __ne__(self, other):
        return not self.__eq__(other)
    def __hash__(self):
        return hash((self.fixed_date, round(self.longitude,6), round(self.latitude, 6)))

e1 = Event(datetime.date(2020, 5, 14), 0.1, 0.2)
e2 = Event(datetime.date(2020, 5, 14), 0.1, 0.200001)

Then

e1==e2 #==>true
print(set([e1]) - set([e2])) # ==> {<__main__.Event object at 0x7f5bb5c3d898>}
print(len(set([e1]) - set([e2]))) # ==> 1!!

I of course expected set([e1]) - set([e2]) to be the empty set. How can I achieve my goal og getting the empty set here (len of set difference == 1).

Juan Chô
  • 542
  • 6
  • 23
  • 2
    A better solution to resolve the discrepancy between `__eq__` and `__hash__` would make use of `self.tolerance` in the calculation inside `__hash__` to ensure conformity with what is used to calculate `__eq__` between two instances. Alternatively, just have `__eq__` compare the `__hash__()` of `self` and `other` to ensure that this kind of mismatch do not occur. – metatoaster Mar 17 '21 at 12:46
  • Yep that, looks like the equality comes from the hash and not from the __eq__ operator. Gonna think on a way of relate both as this mismatch is an annoying bug on my code. Thank you – Juan Chô Mar 17 '21 at 12:50
  • More specifically, the way you are currently calculating a hash is basically akin to fitting a grid over the planet and any positions that fall within the grid, even if in range of another position within "one meter" of the other that is in another grid, will not be counted as identical via the hashing method, while the distance comparison (via equality) would make them similar. – metatoaster Mar 17 '21 at 12:50
  • 1
    Sets are based on individual not relative value – the closest equivalent would be resolution, not tolerance. You can make "all objects at position +/- tolerance" the same, you cannot make "these two objects of relative position < tolerance" the same. Consider what would happen if you added *three* objects with pairwise difference below tolerance – it would be ambiguous which ones to include and which ones not, and anything from 1 to 3 entries would be possible. – MisterMiyagi Mar 17 '21 at 12:51
  • 1
    This is one of the reasons why using a tolerance in `==` is a terrible idea. If you want a comparison operation with a tolerance, don't make that operation `==`. – user2357112 Mar 17 '21 at 13:39
  • 1
    @JuanChô That's not the case. ``float`` equality is based on exact quality of values, not on *tolerance*; it's the float values themselves which only have a limited *resolution*. Comparison with tolerance does not work via equality but [explicit tolerance comparison](https://stackoverflow.com/questions/5595425/what-is-the-best-way-to-compare-floats-for-almost-equality-in-python). – MisterMiyagi Mar 17 '21 at 15:31
  • @JuanChô: All sorts of stuff breaks when you make `==` non-transitive. This is only one example. There's no good way to base a deduplication routine on a non-transitive concept of equality. – user2357112 Mar 17 '21 at 15:32
  • @user2357112supportsMonica certainly but if you read my question carefully I only need reflexivity. – Juan Chô Mar 17 '21 at 15:35
  • 1
    It does not matter whether you only need reflexivity. A ``set`` relies on transitivity. – MisterMiyagi Mar 17 '21 at 15:37
  • That's incredible, I only need reflexivity, if one event is far away from another by one meter and this one away from another one by one meter (so as a==b and b==c but perhaps a is 2 meters away from c) is perfectly fine for me tho have a==b, b==c and a!=c actually is what I am looking for. I am not doing CS neither logic I want to solve a practical problem here and simple positive informative comments are better than theoretical useless info. – Juan Chô Mar 17 '21 at 15:49
  • 2
    If you were trying to build a house in midair, someone told you gravity doesn't work like that, and you replied "I'm not doing physics", would you expect the physics issues to go away? Same thing here. The fact that you don't care about the fundamental issues with your approach doesn't mean they go away. – user2357112 Mar 17 '21 at 15:55
  • ha ha ha @user2357112supportsMonica you guy you are going to unify all the forces before building a home. Good luck! – Juan Chô Mar 17 '21 at 16:29
  • The algorithm that is absolutely fundamental to how `set` operates does not support the logic where `a==b and b==c and a!=c` being `True` holds. I assume you are trying to use `set` being a shortcut to remove elements (e.g. remove a, b gets removed, c remains, remove b, both a and c are gone). This cannot be achieved due to the logic that is foundational to sets. You can create a system where equality is intransitive, but this require deep understanding of maths and logic in order for it to be sane, which you are eschewing, so there's no helping that. – metatoaster Mar 17 '21 at 23:41
  • If the scenario for the short cut is actually what you want, you probably could use a graph instead, where each of these `Event` instances will be in a container, and the graph will have to connect "close enough" nodes so that removal of one node will result in the removal of all nodes that were connected to it, and no further. – metatoaster Mar 17 '21 at 23:44
  • Someone have [asked](https://stackoverflow.com/questions/25282669/storing-coordinates-in-a-smart-way-to-obtain-the-set-of-coordinates-within-a-cer) a similar question some time ago and the answer contains a solution that involve the use of [m-trees](https://en.wikipedia.org/wiki/M-tree) that will actually the properties you wanted, namely range query (i.e. find and remove all points within one meter of this point). – metatoaster Mar 17 '21 at 23:58

1 Answers1

1

Your objects are different based on their hashes, so the set operations are doing the correct thing. Specifically, round(0.200001, 6) == 0.200001. Setting the rounding to 5 or the value to 0.2000001 does what you were expecting.

Kemp
  • 3,467
  • 1
  • 18
  • 27