4

I have a class called Transaction, which contains multiple attributes. If any of these attributes match, then i want those transactions to be treated as duplicate transactions and hence do not want to store duplicates in a set.

class Transaction:
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __eq__(self, other):
        if not isinstance(other, Transaction):
            return NotImplemented
        return self.a == other.a or self.b == other.b

    def __hash__(self):
        # TODO

I learnt that it is important to implement both __eq__ as well as __hash__ if we want to avoid duplicates while inserting in a set. Also, if A == B, then their hashes should also match as per the contract.

How can i implement __hash__ in this case, so that if i try to insert a transaction into the set, then it is rejected if it contains repeated value of either attribute 'a' or 'b'.

Thanks in advance!

Yogesh Kumar Gupta
  • 1,009
  • 1
  • 10
  • 10
  • Yes, this is correct. I want the object to be treated as same in case if any attribute matches. – Yogesh Kumar Gupta Sep 30 '21 at 17:47
  • 2
    I feel like there's a good simple way to do this but for the life of me I can't think of one :) Watching this ticket with interest :) – Ben Sep 30 '21 at 17:50
  • do you mean if both `a` and `b` on the same object are the same then it is considered a duplicate? – gold_cy Sep 30 '21 at 17:52
  • @gold_cy no, they mean if *either* `a` or `b` are the same then it is considered *equal* – juanpa.arrivillaga Sep 30 '21 at 17:54
  • when compared to a different `Transaction`, correct? – gold_cy Sep 30 '21 at 17:54
  • Right, So if self.a == other.a OR self.b == other.b, Then self and other should be considered equal. – Yogesh Kumar Gupta Sep 30 '21 at 17:55
  • 1
    equality is supposed to be transitive. (`x == y and y == z => x == z`). But with your definition, it is not the case, because you can have `x.a == y.a and y.b == z.b`, but `z.a != x.a and z.b != x.b` – njzk2 Sep 30 '21 at 18:37
  • seems related to https://stackoverflow.com/questions/45164691/recommended-way-to-implement-eq-and-hash which has the same subject. **But...** with `a and b` rather than `a or b`. – JL Peyret Sep 30 '21 at 18:39

2 Answers2

1

I'm not sure it's possible to compress an or condition like this into a single hash value. I tried experimenting with applying DeMorgan's law (not nand instead of or) but came up empty.

Your best bet for making the type hashable might just be to return a constant value (such that all instances have the same hash), and rely on the hashtable's collision behavior.

This is implicitly allowed by the standards, because the rule is

a == b implies hash(a) == hash(b)

and not

hash(a) == hash(b) implies a == b

which has never been the case (after all, hash collisions are expected to happen occasionally - a hash is only 32 or 64 bits large)


A set will accommodate for this behavior with its natural collision-avoidance behavior, and while this will not at all be performant, it will at least allow you to use the set data structure in the first place.

>>> class A:
...   def __init__(self, prop):
...     self.prop = prop
...   def __repr__(self):
...     return f'A({self.prop})'
...   def __eq__(self, other):
...     return self.prop == other.prop
...   def __hash__(self):
...     return 0
... 
>>> {A(1), A(2), A(3), A(1)}
{A(1), A(2), A(3)}

Admittedly, this kind of defeats the purpose of using a set, though there might be more point to it if you were using your objects as keys in a dict.

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • 2
    If we keep the hashsame for all the objects, then using a set or a list will not matter much.. Both will have kind of same performance in that case.. Like i can just see if element is present in list(which will call __eq__) and if not, add into the list. – Yogesh Kumar Gupta Sep 30 '21 at 18:19
0

I think it's not possible and you shouldn't do that. Whatever you use in your __eq__ , should also be present in __hash__, otherwise:

let's say you only use hash of a in your __hash__, you would end up with a scenario that two equal objects have different hashes(because their bs are equal) which contradict the actual rule:

if obj1 == obj2 -> True then hash(obj1) == hash(obj2) "must" be True

Same with using only b in __hash__.

S.B
  • 13,077
  • 10
  • 22
  • 49
  • 1
    Exactly, i understand this, and that is the question. I want to achieve this and is there a way to achieve this by maintaining the contract. – Yogesh Kumar Gupta Sep 30 '21 at 18:09
  • @YogeshKumarGupta Emm so I would say there is no way. let's wait for other responses. – S.B Sep 30 '21 at 18:14