2

After calculating the hash key for any immutable object, say a tuple of int and string elements, does the python interpreter keep this value in memory, or does it calculate it again each time. If I have code that relies on repeatedly checking an object's belonging to some collection, such as a set, do I have to take care of caching these keys myself, or will the interpreter do it for me?

x = ("a", 1)
assert x in {("a", 1), ("b", 2)} # first time hash(x) is calculated
assert x in {("c", 3), ("d", 4)} # will python interpreter calculate hash(x) again?

Or let's rephrase that question. Python hash method built into it's native tuple type has O(n) time-complexity, where n is the number of elements in that tuple. If we create a code that calls that method m times, it will have O(n*m) time-complexity. Now the question is, whether does python optimize this case internaly, by caching the value of the hash, so in practice it's reduced back to O(n)?

n = 999_999_999 # big number

x = tuple(i for i in range(n)) # very big tuple, takes lot of time to calculate it's hash

m = 999_999_999 # another big numer

for _ in range(m): # lots of iterations
    hash(x)
Kasia
  • 29
  • 3
  • In your example, `x` is not a "hash key" (whatever you mean by that). It's just a name that refers to a tuple object. But in any case, `set` members and `dict` keys must be *hashable*, which, by definition, means they *must always hash the same*. Thus, caching would be generally pointless unless some of the objects have a custom `__hash__` that is very expensive to calculate - but that is merely a matter of efficiency, rather than a significant behavioural difference. – ekhumoro Jul 14 '23 at 10:40
  • @ekhumoro I'm not sure where I said that x is a hash key. I asked when the hash key of the object referenced by the variable x is calculated. And my question is specifically about efficiency, not behavior. Because for complex objects, like nested tuples with lots of elements, calculating the same hash key over and over again can be significant. – Kasia Jul 14 '23 at 11:05
  • Your question currently makes no mention of efficiency, nor does it include any code which explicitly calculates a hash (such as a user-defined class) - hence my comment. What evidence do you have that hashing causes *significant inefficiency* in your actual code? Have you done any profiling that proves this? The current code example in your question is clearly totally irrelevant as far as that's concerned, since it only uses basic python types. – ekhumoro Jul 14 '23 at 11:18
  • @ekhumoro It seems that my question was clear enough for the user who gave the first answer. You have an immutable object. You execute an expression that requires calculating its hash. Then you execute a second expression that requires calculating its hash. The question is whether the python interpreter will calculate the hash twice, or whether it will only calculate it the first time, and only pull it out of memory the second time. My code seems to exactly represent my question. – Kasia Jul 14 '23 at 11:32
  • Your code does no such thing, because it does not calculate a hash value: i.e. none of the objects involved define a custom `__hash__`. If you look more carefully at the question linked in the answer below, you will see that it is specifically about caching a custom hash value in a user-defined class. But in any case, this is all just a case of premature optimisation, since you haven't demonstrated a genuine problem. Please therefore edit your question and provide a proper [mre]. – ekhumoro Jul 14 '23 at 11:43
  • @ekhumoro Why do you keep talking about custom hash method, when I clearly wrote in the question that I mean native types that have built-in hash methods. And based on this [thread](https://stackoverflow.com/questions/3949310/how-is-set-implemented) set is implemented as a hashtable, so checking the belonging of an object involves calculating hash for that object. And since when do I have to publish here the code I'm working on to get an answer to a question that doesn't require seeing that code? Even if it were a premature optimization, it doesn't make the question invalid. – Kasia Jul 14 '23 at 12:03
  • Custom hash methods are the only thing that could possibly be relevant, since that's the only way to control caching of the values. You have no control over built-in types unless you create a subclass of them and reimplement `__hash__`. Are you asking whether this could be worth it for the tuples in your actual code? Well, surely you must appreciate that the answer to this will depend on the specific nature of the tuples involved (thus, the requirement for a genuine example that demonstrates a measurable performance problem). – ekhumoro Jul 14 '23 at 12:26
  • To put it another way: in your question, you ask if you need "to take care of caching these keys myself". But what "keys" are you talking about? For built-in types, the hash values are calculated internally, and therefore not manageable by your own code. Thus, in your example, what exactly is it that you think you could/should cache? – ekhumoro Jul 14 '23 at 12:43
  • See: https://stackoverflow.com/a/7648538/765091 (including a link to the Python bug tracker explaining the history of why strings cache their hashes but tuples don't). But building your own cache of tuple hashes wouldn't help when checking set or dictionary membership: there's no way of telling the set or dict that it should use your cache. – slothrop Jul 14 '23 at 13:02
  • If the language specification does not guarantee one behavior or the other, then your code should not depend on it to behave one way or the other. If you need an object that caches its hash code, then you should create your own custom object that overrides its default `__hash__` and `__eq__` methods. – Solomon Slow Aug 11 '23 at 17:53

2 Answers2

1

According to this comment, No. (tuple objects)

/* Tests have shown that it's not worth to cache the hash value, see
https://bugs.python.org/issue9685 */

Python doesn't cache the calculated hash for tuples.

S.B
  • 13,077
  • 10
  • 22
  • 49
-1

Based on this answer, it depends on the type of x. Tuples are not cached while strings are.

aaron
  • 39,695
  • 6
  • 46
  • 102
Ach113
  • 1,775
  • 3
  • 18
  • 40