5

Is python's hash function portable?

By "portable" I mean, will it return the same results (for the same data) across python versions, platforms and implementations?

If not, is there any alternative to it that provides such features (while still capable of hashing common data-structures)?


The documentation is not particularly helpful. This question refers a library that seems to roll its own version, but I'm not sure non-portability would be the reason for it.

Community
  • 1
  • 1
loopbackbee
  • 21,962
  • 10
  • 62
  • 97
  • `hash` is just for use with the `dict` type, which is implemented as a hash table. You probably want a function from the `hashlib` module in the standard library. – chepner Jul 01 '15 at 20:14
  • @chepner I'm looking for something that hashes arbitrary data structures like `hash`, I don't think `hashlib` does that? – loopbackbee Jul 01 '15 at 20:16
  • `hash` can't hash arbitrary data structures, either. (Try passing a `dict` or a `set` to `hash`.) – chepner Jul 01 '15 at 20:22
  • @chepner Yes, I'm aware only simple and immutable structures are supported. That would be sufficient for my needs (though maybe not everyone else's) – loopbackbee Jul 01 '15 at 20:25

1 Answers1

4

No, hash() is not guaranteed to be portable.

Python 3.3 also uses hash randomisation by default, where certain types are hashed with a hash seed picked at start-up. Hash values then differ between Python interpreter invocations.

From the object.__hash__() documenation:

By default, the __hash__() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.

Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).

See also PYTHONHASHSEED.

Python 2.6.8 and 3.2.3 and newer support the same feature but have it normally disabled.

Python 3.2 introduced a sys.hash_info named tuple that gives you details about the hash implementation for the current interpreter.

If you need a portable hash, there are plenty of implementations. The standard library includes a cryptographic hash library called hashlib; these implementations are definitely portable. Another option would be the mm3 package which provides Murmur3 non-cryptographic hash function implementations.

Common data structures would need to be converted to bytes first; you could use serialisation for that, like the json or pickle modules.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • That's an excellent response to the main question. As to an alternative, I've [dug a bit more](http://stackoverflow.com/questions/5417949/computing-an-md5-hash-of-a-data-structure), and it looks like serializing to json is a reasonable alternative. If you don't mind me editing your answer, I could include that information – loopbackbee Jul 01 '15 at 20:22
  • @goncalopp: serialising to JSON won't give you a compact hash; you'd still have to run that through some kind of hashing algorithm. I've added a pointer to `hashlib`. – Martijn Pieters Jul 01 '15 at 20:24
  • Yes, you can then run it through hashlib, as in [this answer](http://stackoverflow.com/a/10288255/1595865). Note I've edited the question to clarify I'm looking for something that hashes data-structures as `hash` does, not just strings – loopbackbee Jul 01 '15 at 20:28
  • @goncalopp: then convert those data structures to bytes; JSON can do that, but so can the `pickle` module. – Martijn Pieters Jul 01 '15 at 20:32
  • `pickle` generates different output on different python versions, so it wouldn't be a "portable hash". The JSON serialization is [easy enough to control](https://docs.python.org/2/library/json.html#basic-usage) in order to get something [reasonably canonical](http://wiki.laptop.org/go/Canonical_JSON) – loopbackbee Jul 01 '15 at 20:48
  • @goncalopp: depends on the pickle version you pick, actually. – Martijn Pieters Jul 01 '15 at 20:49