Why doesn't Python hash function give the same values when run on Android implementation?

Question

I believed that hash() function works the same in all python interpreters. But it differs when I run it on my mobile using python for android. I get same hash value for hashing strings and numbers but when I hash built-in data types the hash value differs.

PC Python Interpreter (Python 2.7.3)

>>> hash(int)
31585118
>>> hash("hello sl4a")
1532079858
>>> hash(101)
101

Mobile Python Interpreter (Python 2.6.2)

>>> hash(int)
-2146549248
>>> hash("hello sl4a")
1532079858
>>> hash(101)
101

Can any one tell me is it a bug or I misunderstood something.

I don't know why it differs for `hash()` but maybe you could use base64 instead: http://docs.python.org/2/library/base64.html — gitaarik, Jun 19 '13 at 13:27
@rednaw thanks, but I just want to know is it normal to have different hash values. — Balakrishnan, Jun 19 '13 at 13:36
You should never rely on the hash value being constant between different interpreters. There's nothing in the spec to guarantee that behavior. the only guarantee is that the hash value will always be the same on a particular run of a particular interpreter. — mgilson, Jun 19 '13 at 13:36
Is this a duplicate of http://stackoverflow.com/questions/793761/built-in-python-hash-function ? — David Cary, Aug 16 '15 at 13:55

John La Rooy · Answer 1 · 2013-06-19T13:42:20.820

39

hash() is randomised by default each time you start a new instance of recent versions (Python3.3+) to prevent dictionary insertion DOS attacks

Prior to that, hash() was different for 32bit and 64bit builds anyway.

If you want something that does hash to the same thing every time, use one of the hashes in hashlib

>>> import hashlib
>>> hashlib.algorithms
('md5', 'sha1', 'sha224', 'sha256', 'sha384', 'sha512')

edited Jun 19 '13 at 13:42

answered Jun 19 '13 at 13:34

John La Rooy

295,403
53
369
502

For strings ... I'm not sure that they've added that randomization to integers or other builtin types .. (but I was about to make this point as well) – mgilson Jun 19 '13 at 13:35
2

But, none of the hashlib algorithms hashes data types. – Balakrishnan Jun 19 '13 at 13:59
How can you convert them to strings? `pickle` perhaps? – John La Rooy Jun 19 '13 at 14:02
The hash functions in hashlib are cryptographic and due to performance are not a good solution in all situations. – martinkunev May 31 '18 at 20:16

andrew cooke · Accepted Answer · 2013-06-19T15:41:34.900

for old python (at least, my Python 2.7), it seems that

hash(<some type>) = id(<type>) / 16

and for CPython id() is the address in memory - http://docs.python.org/2/library/functions.html#id

>>> id(int) / hash(int)                                                     
16                                                                              
>>> id(int) % hash(int)                                                 
0

so my guess is that the Android port has some strange convention for memory addresses?

anyway, given the above, hashes for types (and other built-ins i guess) will differ across installs because functions are at different addresses.

in contrast, hashes for values (what i think you mean by "non-internal objects") (before the random stuff was added) are calculated from their values and so likely repeatable.

PS but there's at least one more CPython wrinkle:

>>> for i in range(-1000,1000):
...     if hash(i) != i: print(i)
...
-1

there's an answer here somewhere explaining that one...

Yes I accept with you but how hash will work for non-internal objects. — Balakrishnan, Jun 19 '13 at 14:16
In android `>>> id(int) / hash(int)` gives -2L and `>>> id(int) % hash(int)` gives -2144680448L — Balakrishnan, Jun 19 '13 at 14:21
@andrewcooke wow hash(-1)=-2 really irritated me. In case someone is wondering, the question regarding it is here: http://stackoverflow.com/questions/10130454/why-do-1-and-2-both-hash-to-2-in-python — Jan, Jun 27 '16 at 18:54

score 1 · Answer 3 · answered Jun 19 '13 at 13:31

1

Hashing of things like int relies on id(), which is not guaranteed constant between runs or between interpreters. That is, hash(int) will always produce the same result during a program's run, but might not compare equal between runs, either on the same platform or on different platforms.

BTW, while hash randomization is available in Python, it's disabled by default. Since your strings and numbers are hashing equally, clearly it's not the issue here.

answered Jun 19 '13 at 13:31

Sneftel

40,271
12
71
104

5

Hash randomization is only disabled by default on old versions of Python. For Python 3.3 and later it is enabled by default. – Duncan Jun 19 '13 at 14:05
@Sneftel Why python returns the same hash value for objects of same value but different id's? I would expect ``x='hello world'`` and ``y='hello world'`` which have different id's to have different hash values. – ado sar Dec 16 '21 at 12:10
1

@adosar "just hash the id" is the default behavior of `hash` for things like `Object` and `type` because there's no more generally useful default behavior. For things which *can* have their value hashed in a reasonable fashion (like strings and integers) it hashes the value, not the identity. – Sneftel Dec 16 '21 at 12:15
Remember, `hash` hashing strings by value is the only reason you can reliably do something like `mymap["foo"] = 3` and expect to access the value at `mymap["foo"]` later. – Sneftel Dec 16 '21 at 12:16
@Sneftel In the case ``mymap["foo"] = 3`` why the value must be used? For example, if a dictionary has a key ``"foo"`` then the id is unique, so why the value must be used instead? I mean the key is not variable that maybe later point to some other object. – ado sar Dec 16 '21 at 12:57
Becuase the literal string `"foo"` which you use on line 10 is not (guaranteed to be) the same object as the literal string `"foo"` which you use on line 20. If strings were not hashed by value, each instance of `"foo"` in your code could have a different hash. – Sneftel Dec 16 '21 at 13:01

score 1 · Answer 4 · answered Jun 19 '13 at 13:58

1

With CPython, for efficiency reason hash() on internal objects returns the same value as id() which in its turn return the memory location ("address") of the object.

From one CPython-based interpreter to an other memory location of such object is subject to change. Depending on your OS, this could change from one run to an other.

answered Jun 19 '13 at 13:58

Sylvain Leroux

50,096
7
103
125

This is not true anymore, see http://stackoverflow.com/questions/11324271/what-is-the-default-hash-in-python. – laike9m Aug 24 '15 at 12:59

score 0 · Answer 5 · answered May 17 '18 at 09:35

From Python 3.3 the default hash algorithm has created hash values which are salted with a random value which is different even between different python processes on the same machine.

Hash randomization only is implemented currently for strings - since it was considered to be the most likely data type captured from outside that could be attacked.

The same frozenset consistently produces the same hash value across different machines or even different processes

Source: https://www.quora.com/Do-two-computers-produce-the-same-hash-for-identical-objects-in-Python

Why doesn't Python hash function give the same values when run on Android implementation?

5 Answers5

Linked

Related