0

We've got a list of instances of a class. We effectively want a Set i.e. a group with no repeated elements. Elements of the list which are the same equate, but their hashes are different as they have been instantiated separately. So a==b is True, a is b is False.

Is there a way to vectorise this problem or otherwise make it efficient. The only solutions we can think of involved for loops, and it seems like there might be a more efficient solution.

EDIT: I think its different from the "Elegant ways to support equivalence" as the equivalence works well, its just that Set relies on comparing hashes.

EDIT: The for loop solution would go something like, sort the list, and then iterate over, removing the current value if its the same as the last value

EDIT: To be clear, we don't own this class, we just have instances of it. So we could wrap the instances and implement a more useful hash function, however, this seems like it might be almost as expensive as the for loop approach - could be wrong though

EDIT: sorry if it feels like I'm moving the goalposts a bit here - there isn't a simple val of the object that can be subbed in for a hash, that approach would need to somehow generate UIDs for each different instance.

Community
  • 1
  • 1
nrob
  • 861
  • 1
  • 8
  • 22
  • Is it possible to serialize the state somehow, e.g. can you convert the class to a tuple which represents its state and then hash that? Are the classes less-than comparable (i.e. sortable)? – John Zwinck Jan 17 '17 at 11:36
  • 1
    Possible duplicate of [Elegant ways to support equivalence ("equality") in Python classes](http://stackoverflow.com/questions/390250/elegant-ways-to-support-equivalence-equality-in-python-classes) – e4c5 Jan 17 '17 at 11:37
  • 1
    Yes, `set` needs its items to be hashable, so you just need to implement a suitable `__hash__` method. And make sure that your objects aren't going to get mutated! – PM 2Ring Jan 17 '17 at 11:46
  • 1
    "Elements of the list which are the same equate, but their hashes are different" You should really think about this again. Is it necessary that the hashes are different, or did you just not bother with implementing `__hash__`? Equal and hash being inconsistent opens up a load of problems, the first of which you just encountered. – tobias_k Jan 17 '17 at 11:49
  • Should that be? > The loop will use O(n2) to scan the entire list, while a lookup via hash should only take O(n) in a set. – nrob Jan 17 '17 at 13:00
  • Even if you have to wrap the instances, implementing `__hash__` should still be much faster then using a loop. The loop will use O(n) to scan the entire list, while a lookup via hash should only take O(1) in a set. Be vary, though, that the hash should not change once the elements have been added to the set. (re-posted with typo fixed) – tobias_k Jan 17 '17 at 13:19

2 Answers2

1

I assume you are working with a class you created yourself and that you've implemented your own equality method.

It's true that the default hash method inherited from Object returns different values for different instances. From what I have read, it's either based on id() or it's randomized, depending on the Python version.

However, you can easily implement your own __hash__ method to solve this.

How to implement a good __hash__ function in python

__hash__ should return the same value for objects that are equal. It also shouldn't change over the lifetime of the object; generally you only implement it for immutable objects.

This may not be the answer that you want, but it is a clean and easy way to do it. Then you can just create a Set normally.

Community
  • 1
  • 1
Oersted
  • 175
  • 2
  • 8
0

Maybe this is what you need? Make the hash a function of the class fields.

Here is a simple example:

class A:
    def __init__(self, v):
        self.val = v

    def __eq__(self, other):
        return self.val == other.val

    def __hash__(self):
        return self.val

    def __repr__(self):
        return 'A(%s)' % self.val

a = set([A(2), A(3), A(4), A(2), A(10), A(4)])
print(a)
# {A(10), A(2), A(3), A(4)}
tobias_k
  • 81,265
  • 12
  • 120
  • 179
Israel Unterman
  • 13,158
  • 4
  • 28
  • 35