0

I have a class that essentially wraps a list, and lists apparently can't have hash values. My idea was to generate a random number and store that as the hash value.

alexgolec
  • 26,898
  • 33
  • 107
  • 159
  • 2
    Do you mean you get a _TypeError: unhashable type_ if you try to use your class as a dictionary key (for example)? See [this previous answer](http://stackoverflow.com/questions/1957396/why-dict-objects-are-unhashable-in-python) why that's the case. Could you provide some more information on what you're trying to do? – SteveMc Feb 18 '11 at 21:45

4 Answers4

3

If you have two instances that are equivalent then they must return the same __hash__ value. If you generate one randomly you can't guarantee this will be the case, so you will get strange behaviour.

Can you use a tuple instead of a list? Can you just ignore the list in the computation of the hash? It's OK if non-equal objects have a collision.

You should see DictionaryKeys about why lists aren't allowed to have hash values.

sjr
  • 9,769
  • 1
  • 25
  • 36
  • Or, possibly "freeze" the list for this to work at all. That usually means a shallow copy from the class that wraps a list to a very similar class that wraps a tuple. Often with a method name of `freeze()`. – S.Lott Feb 18 '11 at 21:59
  • You can do this just by using `tuple(yourlist)`, it will just take a copy of the list. It's immutable though, don't know if the OP needs it to be mutable. – sjr Feb 18 '11 at 22:09
  • If you generate the hash in the constructor and store it as member data, every call to the hash method will naturally return the same value from that member. –  Feb 18 '11 at 22:22
  • @Steve314 - how does that solve the problem of enforcing the `__hash__` contract? – sjr Feb 18 '11 at 22:28
  • @sjr - The same instance always has the same hash value. As I say in my answer, that might be valid (identifying the instance, so the "value" you're looking up is the reference) - though it leads to the question "why?". The only sane answer I can think of is a set of instances, so you don't add the same instance (irrespective of value) twice. An odd requirement, but not impossible. –  Feb 18 '11 at 22:45
  • @sjr - IOW, sometimes "same instance" is a useful definition of "equivalent", often called (as someone else already mentioned) referential equality. –  Feb 18 '11 at 22:47
1

Not a good idea. The general contract of hash code is that if Object A equals Object B, A.hashCode() equals B.hashCode(). With what you're proposing this wont hold.

You could try using

  • list length as the hash code
  • hash code of the first item in the list as the hashcode
  • sum of all hashcodes of all items as the hashcode

or something else along these lines.

iluxa
  • 6,941
  • 18
  • 36
  • see the link in @sjr's answer on why even that's not a good idea, even though it technically satisfies the contract. – Davy8 Feb 18 '11 at 21:55
1

As mentioned by others, a if 2 objects are considered "equal" then they must have the same hash value.

How you define equals for your class is up to you. If you only care about referential equality then you could generate a random number on __init__, but it better be the case that MyWrapper([1,2,3]) == MyWrapper([1,2,3]) is False.

The reason you shouldn't use the contents of the list as a hash like @iluxa suggests is because if you use your class as a key in a dictionary, then change the contents such that the hash value changes, it would not be able to find that key in the dictionary because it stored the old hash value and is trying to look up the new one.

To sum up:

  1. If a == b then hash(a) == hash(b) must be true.
  2. If a != b then hash(a) != hash(b) should be true most of the time for performance of lookups, but it is not required to be the case.
  3. The value of a hash should not change between the time of adding it to a dict (or any other hash-based lookup structure that doesn't recalculate hash values on look-up) and trying to find it.

The last part is often just simplified to the value of the hash should not change ever.

Davy8
  • 30,868
  • 25
  • 115
  • 173
  • I also had similar confusion in the past and asked this question (it's about C# but same principles apply): http://stackoverflow.com/questions/873654/overriding-gethashcode-for-mutable-objects-c – Davy8 Feb 18 '11 at 22:15
0

It is reasonable to support hashing for mutable containers - but the general rule is that you're hashing the current value, not the container.

EDIT That's reasonable in principle, though Python generally guards against it.

There's even a specific hashing technique for this kind of task. Zobrist hashing calculates a hash incrementally, based on each change to the container, in such a way that no matter how you reach a particular value you get the same hash. This is used in chess programs to detect cases where the board reaches the same state through different sequences of moves, as an optimisation for the game tree searching.

The question here is why do you want to find a list using a hash-based lookup?

If the value varies, but you want to find the same container anyway, then this may work (with collision issues) - but it still seems pointless. The random hash value is just a kind of identifier, identifying which object you're referring to. The reference/pointer to the object is a better identifier. And how will you know which hash value to search for if you don't already have a reference to the object anyway?

If you want a collection of containers, a list of lists is a better solution than a dictionary of randomly-hashed lists.

EDIT A possible exception is if you want a collection of lists where the same list may be added more than once, but will only be present once in the container - a set of list instances. In that case, the hash should IMO be based on the reference to the object, but I can't remember how that's done.