1

I'm trying to cache the function expand by only its first argument. I don't care about the values of the other arguments for the purpose of caching.

Since the other arguments are dicts, they aren't cacheable, so I've defined a class to contain these arguments whose hash always returns 0, so it should be ignored by the caching function.

I have added some cutdown code below. I'm on Python version 3.5.2.

class Node:
    def __init__(self, value):
        self.value = value

    def expand(self, a1, a2):
        return '{},{},{}'.format(self.value, a1, a2)


class ExpandArgs:
    def __init__(self, a1, a2):
        self.a1 = a1
        self.a2 = a2

    def __hash__(self):
        # We don't care about the hash, but it's required for caching
        return 0


@functools.lru_cache(maxsize=None)  # hash of args is always 0, so it should be ignored, and the hash of node should be used as the cache key
def expand(node, args):
    a1 = args.a1
    a2 = args.a2
    return node.expand(a1, a2)


e1 = ExpandArgs({}, {})
e2 = ExpandArgs({}, {})
print(hash(e1))  # 0
print(hash(e2))  # 0
node = Node(123)
print(expand.cache_info())  # CacheInfo(hits=0, misses=0, maxsize=None, currsize=0)
expand(node, e1)
print(expand.cache_info())  # CacheInfo(hits=0, misses=1, maxsize=None, currsize=1)
expand(node, e2)
print(expand.cache_info())  # CacheInfo(hits=0, misses=2, maxsize=None, currsize=2)
expand(node, e1)
print(expand.cache_info())  # CacheInfo(hits=1, misses=2, maxsize=None, currsize=2)
expand(node, e2)
print(expand.cache_info())  # CacheInfo(hits=2, misses=2, maxsize=None, currsize=2)

Since hash(e1) == hash(e2), I would expect the second call to expand() to hit the cached value for e1, but it misses.

Why don't I get 1 cache miss, and 3 cache hits for the above code?

philip238
  • 21
  • 4

2 Answers2

1

Turns out eq is used rather than hash for checking whether the function arguments are equal for the purposes of caching, so when I change the class it works.

class ExpandArgs:
    def __init__(self, context, forecast_transaction_node_map, date_range_func):
        self.context = context
        self.forecast_transaction_node_map = forecast_transaction_node_map
        self.date_range_func = date_range_func

    def __hash__(self):
        # We don't care about the hash, but it's required for caching
        return 0

    def __eq__(self, other):
        return isinstance(other, self.__class__)
philip238
  • 21
  • 4
0

Just wanted to add some notes on this since I spent some time reading the functools source code, trying to figure out why __eq__ is being used here.

It turns out that this is a basic feature of Python dictionaries that are used as the cache (tested on python 3.9): __hash__ is used first but __eq__ is used when the hash matches to make sure the objects are actually different:

In [5]: class CompTuple(tuple):
   ...:     """Tuple that prints whenever equality operator is
   ...:  used"""
   ...:
   ...:     __hash__ = tuple.__hash__
   ...:
   ...:     def __eq__(self, other):
   ...:         print("equality comparison")
   ...:         return super().__eq__(other)
   ...:

In [6]: t1 = CompTuple( (1,2,3,) )

In [7]: t2 = CompTuple( (1,2,4,) )  # has different hash than t1

In [8]: t3 = CompTuple( (1,2,3,) )  # has same hash as t1

In [9]: d={}

In [10]: d[t1]=1

In [11]: d[t2]
--------------------------------------------------------------
KeyError                     Traceback (most recent call last)
<ipython-input-11-c1f54cc7c51f> in <module>
----> 1 d[t2]

KeyError: (1, 2, 4)

In [12]: d[t3]
equality comparison  # equality comparison because of same hash
Out[12]: 1

In [13]: d[t2]=2

In [14]: d[t3]       # still only one equality comparison
equality comparison
Out[14]: 1

The python docs explicitly require that objects which compare equal have the same hash value.

However, a call to __eq__ can be substantially more expensive than a call to __hash__. This can have the counterintuitive effect of making an lru_cache lookup expensive when the cache already contains an object of the same hash.

P.S. Just to make things a little more confusing, there is a shortcut in the dictionary lookup that skips the call to __eq__ when comparing identical objects (=objects with same id):

In [15]: d[t1]  # no equality comparison, t1 *is* in the cache
Out[15]: 1
leopold.talirz
  • 640
  • 7
  • 19