17

How does a hash value of some particular string is calculated in CPython2.7?

For instance, this code:

print hash('abcde' * 1000)

returns the same value even after I restart the Python process and try again (I did it many times).

So, it seems that id() (memory address) of the string doesn't used in this computation, right? Then how?

d-d
  • 1,775
  • 3
  • 20
  • 29
  • 3
    "returns the same value even after I restart the Python process and try again" - not guaranteed, and usually not true on Python 3. "it seems that id() (memory address) of the string doesn't used in this computation" - well, of course not. Otherwise, we wouldn't have the invariant that `a == b` implies `hash(a) == hash(b)`. – user2357112 Oct 28 '16 at 04:30
  • I think you need to run `help(hash)` and `help(id)` to understand the difference between the two because they are not the same... – Tadhg McDonald-Jensen Oct 28 '16 at 04:31
  • 1
    Maybe this thread will shed some light? http://stackoverflow.com/questions/6008026/how-hash-is-implemented-in-python-3-2 – Savir Oct 28 '16 at 04:32
  • @BorrajaX, thanks, I'll take a look – d-d Oct 28 '16 at 04:35
  • Also relevant: https://stackoverflow.com/q/19580412/1959808 – 0 _ Jan 18 '18 at 07:21

1 Answers1

24

Hash values are not dependent on the memory location but the contents of the object itself. From the documentation:

Return the hash value of the object (if it has one). Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup. Numeric values that compare equal have the same hash value (even if they are of different types, as is the case for 1 and 1.0).

See CPython's implementation of str.__hash__ in:

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
Selcuk
  • 57,004
  • 12
  • 102
  • 110
  • 2
    thanks. so it seems, that the hash(string) will be the same from run to run, right? – d-d Oct 28 '16 at 04:34
  • 19
    @d-d No, it is not guaranteed to be the same every time, but it is guaranteed to return the same value within the same process. If you want a non-changing hash, use [hashlib](https://docs.python.org/3/library/hashlib.html) functions instead. – Selcuk Oct 28 '16 at 04:35
  • yeah, I thought about it, but I need something faster than any hash function in this module. may be murmur hash or so.. – d-d Oct 28 '16 at 04:38
  • 1
    Check out various hash functions [here](http://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed) – Selcuk Oct 28 '16 at 04:40
  • 1
    @Lampe2020 See my comment above. It won't even return the same value when you launch a new Python process. – Selcuk Jun 13 '23 at 00:08