24

I'm new to Python, and I wanted to make sure that I overrode __eq__ and __hash__ correctly, so as not to cause painful errors later:

(I'm using Google App Engine.)

class Course(db.Model):
    dept_code = db.StringProperty()
    number = db.IntegerProperty()
    title = db.StringProperty()
    raw_pre_reqs = db.StringProperty(multiline=True)
    original_description = db.StringProperty()

    def getPreReqs(self):
        return pickle.loads(str(self.raw_pre_reqs))

    def __repr__(self):
        title_msg = self.title if self.title else "Untitled"
        return "%s %s: %s" % (self.dept_code, self.number, title_msg)

    def __attrs(self):
        return (self.dept_code, self.number, self.title, self.raw_pre_reqs, self.original_description)

    def __eq__(self, other):
        return isinstance(other, Course) and self.__attrs() == other.__attrs()

    def __hash__(self):
        return hash(self.__attrs())

A slightly more complicated type:

class DependencyArcTail(db.Model):
    ''' A list of courses that is a pre-req for something else '''
    courses = db.ListProperty(db.Key)

    ''' a list of heads that reference this one '''
    forwardLinks = db.ListProperty(db.Key)

    def __repr__(self):
        return "DepArcTail %d: courses='%s' forwardLinks='%s'" % (id(self), getReprOfKeys(self.courses), getIdOfKeys(self.forwardLinks))

    def __eq__(self, other):
        if not isinstance(other, DependencyArcTail):
            return False

        for this_course in self.courses:
            if not (this_course in other.courses):
                return False

        for other_course in other.courses:
            if not (other_course in self.courses):
                return False

        return True

    def __hash__(self):
        return hash((tuple(self.courses), tuple(self.forwardLinks)))

Everything look good?

Updated to reflect @Alex's comments

class DependencyArcTail(db.Model):
    ''' A list of courses that is a pre-req for something else '''
    courses = db.ListProperty(db.Key)

    ''' a list of heads that reference this one '''
    forwardLinks = db.ListProperty(db.Key)

    def __repr__(self):
        return "DepArcTail %d: courses='%s' forwardLinks='%s'" % (id(self), getReprOfKeys(self.courses), getIdOfKeys(self.forwardLinks))

    def __eq__(self, other):
        return isinstance(other, DependencyArcTail) and set(self.courses) == set(other.courses) and set(self.forwardLinks) == set(other.forwardLinks)

    def __hash__(self):
        return hash((tuple(self.courses), tuple(self.forwardLinks)))
Machavity
  • 30,841
  • 27
  • 92
  • 100
Nick Heiner
  • 119,074
  • 188
  • 476
  • 699

1 Answers1

18

The first one is fine. The second one is problematic for two reasons:

  1. there might be duplicates in .courses
  2. two entities with identical .courses but different .forwardLinks would compare equal but have different hashes

I would fix the second one by making equality depend on both courses and forward links, but both changes to sets (hence no duplicates), and the same for hashing. I.e.:

def __eq__(self, other):
    if not isinstance(other, DependencyArcTail):
        return False

    return (set(self.courses) == set(other.courses) and
            set(self.forwardLinks) == set(other.forwardLinks))

def __hash__(self):
    return hash((frozenset(self.courses), frozenset(self.forwardLinks)))

This of course is assuming that the forward links are crucial to an object's "real value", otherwise they should be omitted from both __eq__ and __hash__.

Edit: removed from __hash__ calls to tuple which were at best redundant (and possibly damaging, as suggested by a comment by @Mark [[tx!!!]]); changed set to frozenset in the hashing, as suggested by a comment by @Phillips [[tx!!!]].

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • 1
    @Alex: doesn't that hash depend on the order of the elements in `tuple(set(self.courses))`, which might be somewhat arbitrary? – Mark Dickinson Jun 19 '10 at 21:00
  • @Mark, it's arbitrary but not capricious -- though I'm not 100% sure that arbitrarily shuffled lists with the same items would produce the same-ordered set (it may vary with versions of Python), so it's best to avoid the `tuple` calls I lazily left in -- let me edit accordingly, thanks. – Alex Martelli Jun 19 '10 at 21:25
  • 4
    I think you should use `frozenset` instead of `set`, the latter being unhashable. – Philipp Jun 19 '10 at 21:37
  • when you override equal to should not you recommend the not equal to part be also overridden for consistent behavior http://stackoverflow.com/questions/390250/elegant-ways-to-support-equivalence-equality-in-python-classes ? – Alex Punnen Feb 29 '16 at 05:09
  • This is an incomplete answer. Please check [here](http://stackoverflow.com/questions/390250/elegant-ways-to-support-equivalence-equality-in-python-classes/25176504#25176504). Few points: 1. You should implement `__ne__` (and probably use [total_ordering](https://docs.python.org/2/library/functools.html#functools.total_ordering). 2. When you don't know how to equal, you should return `NotImplemented` – Guy Sep 08 '16 at 13:31