2

Suppose object a has a very expensive hash function, and I wish to query a in different dicts or sets. If I do it naively:

d1_res = d1[a]
d2_res = d2[a]

I'll have to do two hashes. What I wish is something like:

EDIT: The following code in the original question is wrong!

hashvalue = hash(a)
d1_res = d1.getitem(a, hashvalue=hash)
d2_res = d2.getitem(a, hashvalue=hash)

EDIT: This is the correct sample code

hashvalue = hash(a)
d1_res = d1.getitem(a, hashvalue=hashvalue)
d2_res = d2.getitem(a, hashvalue=hashvalue)

Thus I only need to do one hash. Is there any way for this? Or is there any underlying Python mechanism that prevents such interface?

EDIT: the message below is important

An easy solution seems to cache the hash result in the __hash__ method, but my example here is a simplified one. Actually, the hash function in my real case is not expensive (just int hash). But the hashing is carried out for a lot of times and I want to cut the expense. I'm writing a C/C++ extension so I'm looking for any possible performance improvement.

Thanks in advance.

liwt31
  • 918
  • 10
  • 10
  • `d1.getitem(a, hashvalue=hash)` will return value stored under key `a` , the 2-param version provides you a default value to return if the key is _not_ in the dict. Your `hashvalue=hash` looks wrong . Are you trying to create a dict for hashlookup of object a? What is the dictionary using to store your object 'a' - it uses `a` itself as key? So it will call `a.__hash__` ... I kinda do not get you... – Patrick Artner Mar 04 '19 at 08:21
  • How about inheriting from dict and cache the hash(a) inside for reuse? See https://stackoverflow.com/questions/2390827/how-to-properly-subclass-dict-and-override-getitem-setitem/2390997 – balderman Mar 04 '19 at 08:24
  • https://en.wikipedia.org/wiki/XY_problem ? to store `a` you need its `__hash__` but you do not like to calculate the hash because its costly so you store the hash in some other dict ... ? – Patrick Artner Mar 04 '19 at 08:25
  • @balderman How exactly would you reuse the cached hash though? – Aran-Fey Mar 04 '19 at 08:25
  • You can have your own version of get_item that will get your object as argument (and optional int - the hash) and will return a tuple with 2 entries: 1) the value that is pointed by your object 2) the hash of this object. Next time you do the get item you can pass the hash value you got in the previous call – balderman Mar 04 '19 at 08:56
  • I'm sorry for the confusion. I provided incorrect sample code. Please check out the modified question. I also emphasized that I can't simply use a cache. – liwt31 Mar 04 '19 at 14:00

1 Answers1

2

Here is an idea that will use the object itself (the dict key) to keep its hash.

The dict implementation should not be aware - it will just call hash.

Every 'setter' will make the cached hash value None and will force recalculation.

class MyComplexObject:
    def __init__(self, name, size):
        self._name = name
        self._size = size
        self.hash_value = None

    def __hash__(self):
        if self.hash_value is None:
            #  heavy calculations goes here
            #  the result of the calculations is 7 (as an example)
            self.hash_value = 7
        return self.hash_value

    @property
    def name(self):
        return self._name

    @name.setter
    def name(self, name):
        self._name = name
        self.hash_value = None

    @property
    def size(self):
        return self._size

    @size.setter
    def size(self, size):
        self._size = size
        self.hash_value = None
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
balderman
  • 22,927
  • 7
  • 34
  • 52