Python dataclasses: is it safe to use an id-based hash instead of `unsafe_hash=True`?

Question

I have a dataclass that represents a 2D point:

@dataclass
class Point:
    x: int
    y: int

I want the Point class to have the following behaviour, so that different point objects can be compared based on value, but also stored separately in a dictionary:

p1 = Point(5, 10)
p2 = Point(5, 10)

p1 == p2  # Should return True
p1 is p2  # Should return False
hash(p1) == hash(p2)  # Should return False, so that they can be stored as different entries in a dict

I could use unsafe_hash=True, e.g.

@dataclass(unsafe_hash=True)
class Point:
    x: int
    y: int

But this will cause problems when the points are stored in a dictionary. E.g.

p1 = Point(5, 10)
p2 = Point(5, 10)

d = {p1: True, p2: False}
d  # Returns {Point(x=5, y=10): False}

Is there any reason why I shouldn't instead implement the __hash__ method to return a hash based on the id of the object? Similar to the implementation of object.__hash__ (ref).

@dataclass
class Point:
    x: int
    y: int

    def __hash__(self):
        return hash(id(self))

This seems like the simplest solution, and gives the class the desired behaviour, but it seems to go against the advice in the Python docs:

The only required property is that objects which compare equal have the same hash value

Does this answer your question? [How can I make a python dataclass hashable?](https://stackoverflow.com/questions/52390576/how-can-i-make-a-python-dataclass-hashable) — Dainius Preimantas, Dec 30 '22 at 12:11
What is the real problem you are trying to solve? Why do you want two equal objects to have different entries in a dictionary? — Paweł Rubin, Dec 30 '22 at 12:19
@DainiusPreimantas thanks for the suggestion, but unfortunately not. The answer to that question suggests implementing a new `__hash__` method but doesn't go into detail. One of the comments to the answer mentioned using `hash(self.id)`, but this is of course different to `hash(id(self))` — Rob, Dec 30 '22 at 14:14
@PawełRubin - the reason for this is to generate additional data for each point (a bool in my example), and store in a dictionary for quick access (there are a lot of points). There are also many places in my program where the `__eq__` method is useful for `Point`. The point objects are frequently updated, so need to remain mutable, and may be set to the same coordinates as another point. — Rob, Dec 30 '22 at 14:23

score 1 · Accepted Answer · answered Dec 30 '22 at 15:52

You're right that that __hash__ goes against the recommendations in the docs. In fact, it's more than a recommendation; Python's built-in data structures are allowed to rely on that assumption. If you have two equal objects which hash to different values, then indexing into a dictionary whose keys are those points could return either point, depending entirely on how the dictionary is stored internally. Now, the current Python implementation may or may not actually do this, but other Python implementations could, and a future version of Python might break your code. Bottom line: It's a bad idea.

Let me propose a different idea. You want two different things, so consider making two different classes. One class can represent the concrete notion of identity (both in __hash__ and in __eq__, for consistency). Then have a separate class for the point-to-point equality.

class PointObject:
  point: Point

  def __init__(self, point: Point) -> None:
    self.point = point

@dataclass(frozen=True)
class Point:
  x: int
  y: int

Now your hash can have PointObject as keys, but you can do all of your business logic on the contained Point objects. Obviously, you might pick a better name than "PointObject", depending on what they actually represent in your domain model.

Thanks Silvio! In hindsight, this makes a lot of sense with the wording in the docs. Using your suggestion without frozen (as I need the points to be immutable) and then using the `PointObject`s as keys for the dictionary has worked for me. — Rob, Jan 03 '23 at 08:51

Python dataclasses: is it safe to use an id-based hash instead of `unsafe_hash=True`?

1 Answers1