1

I'd like to know if there is a way to override the hash function that are already defined for builtin types like int. In python the hash of an int gives his own value and i'd like to avoid that in my project.

The hash would be used for a dict, so overriding the hash function that the dict uses could work too (but I don't think it is possible).

I have tried to create my own function and replace the int hash by that, but it didn't work.

def my_hash(obj) :
    return hash((obj,))

int.__hash__ = my_hash

The code above gave me the error "TypeError: can't set attributes of built-in/extension type 'int'".

Edit:

I try to do that because I have an instance of the problem that is designed to have a lot of collisions in the hashtable. Such an instance can be created because we know that integers are hashed to themselves.

Here's the code that generate the instance:

t, n = 1, 2 * 10 ** 5
mask = (1 << 17) - 1
fill = int((1 << 15) * 1.3 + 1)
arr = []
arr += [mask + 2] * 2
x = 6
for i in range(1, fill):
    arr += [x] + [x]
    x = x * 5 + 1
    x = x & mask
arr += [1] * (n - len(arr))
larticho
  • 161
  • 8
  • 2
    As you sure you want to do this? I wouldn't be completely shocked if someone could identify a scenario where this could be a good idea, but I'd be surprised. How does the int hash function fail you? – President James K. Polk Jan 28 '23 at 23:06
  • @PresidentJamesK.Polk The problem is, some instance of the problem I have are designed to have a lot of collisions in the hash table (maybe only collisions actually). That's why I'd like to create my own hash function for integers. – larticho Jan 28 '23 at 23:19
  • 2
    Can you show such instances designed to have a lot of collisions? – Kelly Bundy Jan 28 '23 at 23:24
  • @KellyBundy I've added an edit – larticho Jan 28 '23 at 23:31
  • What makes you think those numbers cause a lot of collisions? – Kelly Bundy Jan 28 '23 at 23:34
  • 2
    @KellyBundy: Try it. `dict.fromkeys(arr)` takes far longer on this input than on something like `range(200000)`. – user2357112 Jan 28 '23 at 23:38
  • 2
    Python's built-in int hash works fine on most input, but it has no protection against adversarial input. Unlike strings, there's no hash randomization. It's very easy to mount a collision attack against the Python int hash. The best option I'm aware of for mitigating this is to use `str(n)` instead of `n` as the key when adversarial input is a problem. – user2357112 Jan 28 '23 at 23:42
  • @user2357112 Interesting. It's nowhere near as bad as if you replaced it with `arr = [i * (2**61-1) for i in range(len(arr))]`, but still worse than I would've thought. – Kelly Bundy Jan 28 '23 at 23:44
  • @KellyBundy you can see in [this link](https://stackoverflow.com/questions/13514716/overriding-pythons-hashing-function-in-dictionary) for example (in the answer) that the probing function used by python dictionaries is very similar to what is used to generate the input. – larticho Jan 28 '23 at 23:47
  • I don't think you can change int or dict to make this work (except by modifying Python itself), so as the question is currently written, I'm afraid your quest doesn't have a solution. But we could probably find one if you asked about the "project", i.e., if we can use something other than just raw int and dict. – Kelly Bundy Jan 29 '23 at 00:03
  • (Or if we can change how your project uses the ints and dicts.) – Kelly Bundy Jan 29 '23 at 00:27

1 Answers1

-1

So apparently, it is not possible to directly override the int hash function, but it is possible to get around the problem.

One solution (as mentioned by @user2357112 in the comments) is to use the string representation of the integers as key to the dictionary.

Another solution is to create my own integer class that overrides the hash function:

class MyInt(int):
    def __new__(cls, *args, **kwargs):
        return  super(MyInt, cls).__new__(cls, args[0])
    
    def __hash__(self):
        return hash((super().__hash__(),))

I can then use these instead of regular int in my problem.

This allows me to define the hash function however I want to prevent a collision attack.

larticho
  • 161
  • 8