4

So, this is fun - Python's hash notoriously returns True on hash(-1) == hash(-2), as discussed elsewhere, but what about this?

>>> hash( (-2,2) ) == hash( (2,-2) )
True

Is this a feature?

Some other quick experiments:

>>>(-2,2) == (2,-2)
False
>>>hash( (-1,) ) == hash( (-2,) )
True
>>>hash( (-1,-2) ) == hash( (-2,-1) )
True
>>>hash( (-2.01,2.01) ) == hash( (2.01,-2.01) )
False
>>>hash( (-1,1) ) == hash( (1,-1) ) 
False
Community
  • 1
  • 1
Tom Stephens
  • 475
  • 4
  • 15
  • 1
    What kind of feature would that be? I am inclined to say it's "obviously" not a feature since you want your hash functions not to have these kinds of properties. And what is your question? – Niklas B. Feb 13 '14 at 02:05
  • 1
    By the way, you probably meant to write `(-1,)` because the extra pair of parens is redundant otherwise – Niklas B. Feb 13 '14 at 02:11
  • Well, `hash(-1) == hash(-2)` is a feature in the sense that it's not a bug, and there's a reason it is that way (above link). Does this have a similar explanation - or is it just an oddity? That is my question. – Tom Stephens Feb 13 '14 at 02:15
  • 2
    it's not a bug because hash collisions cannot be avoided. It's certainly not a feature though, merely a workaround/implementation detail. It's very likely to be implementation-dependent as well, so it has little to do with Python and more with the cpython implementation (have you tried it in pypy or ironpython?) – Niklas B. Feb 13 '14 at 02:19
  • 1
    @TomStephens Please check my answer for a comprehensive explanation. – thefourtheye Feb 13 '14 at 05:47

2 Answers2

3

It's not a feature; it's a coincidence. Hash collisions occur.

Python's int hashing is really dumb and its tuple hashing is usually okay.

Python's dict implementation is meant to kick serious butt with bad hashes, so it doesn't matter too much.

Niklas B.
  • 92,950
  • 18
  • 194
  • 224
Mike Graham
  • 73,987
  • 14
  • 101
  • 130
  • 2
    By coincidence, I actually just looked up the way Python did its tuple hashing earlier this week, since I was considering writing an implementation-independent/language-independent hash for JSON data. I found it somewhat interesting: http://hg.python.org/cpython/file/6e04027ed53e/Objects/tupleobject.c#l341 – Mike Graham Feb 13 '14 at 02:24
2

Before looking at the hashing logic, lets see few tidbits.

  1. Hash value of the python integers will be the numbers themselves.

  2. Only exception to point 1 is, -1. Because -1 is internally used to indicate error cases. So, hash value of -1 will be -2. (For -2 also, it will be -2 only.)

With this understanding, we can answer your examples 2, 3 and 5.

Example 2: Hash value of -1 == Hash value of -2. So, it is true.

Example 3: Same as the reason for Example 2

Example 5: Hash value of -1 and 1 are different (-2 and 1 respectively). That's why the result is False.

The case, Example 4, hash value of floating point numbers are not the same as the numbers themselves, because hash values should be integers. And the numbers are not represented as they are in the memory. (2.01 is NOT stored as 2.01 itself in the memory). So, hash values of them being different is acceptable.

To answer the Example 1, lets see some code. As per the tuple hashing implementation, lets calculate the hash value using a Python program

x, hash_mult = int("0x345678", 16), 1000003
x, hash_mult = (x ^ 2) * hash_mult, hash_mult + 82522
x = (x ^ -2) * hash_mult
print(x + 97531)                                   # -3713082714466793269

x, hash_mult = int("0x345678", 16), 1000003
x, hash_mult = (x ^ -2) * hash_mult, hash_mult + 82522
x = (x ^ 2) * hash_mult
print(x + 97531)                                   # -3713082714466793269

print(hash((-2, 2)))                               # -3713082714466793269

Note: This is not only true for -2, 2, it is true for all integers, except the above seen case and all the multiples of 8.

thefourtheye
  • 233,700
  • 52
  • 457
  • 497