How to quickly compute a hash for a collection of objects?

Question

Consider a function f(*x) which takes a lot of arguments *x. Based on these arguments (objects), the function f composes a rather complex object o and returns it. o implements __call__, so o itself serves as a function. Since the composition of o is pretty time consuming and in my scenario there is no point in having multiple instances of o based on the same arguments *x, they are to be cached.

The question is now: How to efficiently compute a hash based on multiple arguments *x? Currently I am using a python dictionary, and i concatenate the str() representations of each x to build each key. It works in my scenario, but it feels rather awkward. I need to call the resulting objects o in a very high frequency, so I suspect the repeated call of str() and the string concatenations waste a lot of computation time.

In my case, the arguments have a variety of types. I usually have tuples of integers, some floats and at least one str() representation of a context object coming from another library. — schreon, Jun 09 '14 at 11:06
Then I would recommend simply calling [`hash()`](https://docs.python.org/2/library/functions.html#hash) on the individual arguments in order to use the same hash Python uses internally to compare dictionary keys for example. — Lukas Graf, Jun 09 '14 at 11:10

score 2 · Answer 1 · edited May 23 '17 at 11:51

You can use the hash built-in function, combining the hashes of the items in x together. The typical way to do this (see e.g. the documentation) would be an xor across all the hashes of the individual objects:

it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects

To implement that in a functional way, using operator and reduce:

from functools import reduce # only required in Python 3.x
from operator import xor

def hashed(*x):
    return reduce(xor, map(hash, x))

See also this question.

score 2 · Answer 2 · answered Jun 09 '14 at 11:20

Since version 3.2, Python already contains an implementation of an LRU cache that you can use to cache your functions' results based on their arguments: functools.lru_cache

Example:

from functools import lru_cache

@lru_cache(maxsize=32)
def f(*args):
    """Expensive function
    """
    print("f(%s) has been called." % (args, ))
    return sum(args)


print(f(1, 2, 3))
print(f(1, 2, 3, 4))
print(f(1, 2, 3))
print(f.cache_info())

Output:

f((1, 2, 3)) has been called.
6
f((1, 2, 3, 4)) has been called.
10
6
CacheInfo(hits=1, misses=2, maxsize=32, currsize=2)

(Notice how f(1, 2, 3) only got called once)

As suggested in the comments, it's probably best to simply use the hash()es of your arguments to build the cache-key for your arguments - that's what lru_cache already does for you.

If you're still on Python 2.7, Raymond Hettinger has posted some recipes with LRU caches that you could use in your own code.

How to quickly compute a hash for a collection of objects?

2 Answers2