3

Is there a way to extract existing key hashes from a dictionary, without recalculating them again?

What would be the risks for exposing them and consequntly, accessing the dict by hashes not keys?

Vlad M
  • 477
  • 3
  • 10
  • 1
    I'm not sure what you're asking. A hash and a key are not equivalent - hashes can collide, but keys cannot. I guess it would depend on the implementation if you can go through the internals of the hash and inspect each entry, but this doesn't exist in CPython – Mark Nunberg Apr 01 '16 at 19:55
  • 1
    `for key in my_dict: print hash(key)` I guess maybe ... – Joran Beasley Apr 01 '16 at 19:56
  • @JoranBeasley technically those hashes can be recomputed again though (a `__hash__` implementation may cache the hash, but it's not guaranteed) – Mark Nunberg Apr 01 '16 at 19:58

2 Answers2

2

I don't think Python's dictionary objects have any public API that allows you to see the hashes their objects are stored with. You can't store an object directly by hash in Python code (it may be possible by calling internal C functions in CPython). There are a few good reasons that you can't add values to a dictionary by hash value, rather than by key.

The most obvious is that multiple key objects might have the same hash. If such a hash collision happens, the second value will be inserted somewhere else in the hash table. The important thing is that it won't overwrite the previous value stored under a different key that hashes the same. If you could just pass the hash and not the key too, Python wouldn't be able to tell if you were using the same key or if you were providing a new key that happened to have a colliding hash.

A secondary reason that you can't insert by hash is that it would be a security vulnerability. The performance of a hash table like Python's dictionaries is very good when there are few hash collisions. It is very bad however if every hash is the same. If you could submit data to a Python program that all hashes to the same value, you can perform a very efficient denial of service attack (the new hash randomization for strings was added in recent versions of Python to make this kind of attack harder).

Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • Well, you could still shoot yourself in the foot by providing a custom `__hash__` method that returns the same value regardless of the contents, and simply differs in comparison results – Mark Nunberg Apr 01 '16 at 19:59
  • Exactly doing that: query a dict by providing a hash -> datadump of (value or a list of values). Dict object Introspection. – Vlad M Apr 02 '16 at 03:12
2

A Python dict's keys must be hashable, i.e., implement the __hash__ special method (as well as some other methods irrelevant to your question), or be one of some predetermined built in types. So you actually can access the hash value of a key without the table, e.g., through

>>> '123'.__hash__()
163512108404620371

or more uniformly

>>> hash('123')
163512108404620371
>>> hash(2)
2

That being said, as noted by the comments, a hash value and a position in a table are not the same thing. In fact, as a table resizes, the hash value of a key will remain the same, but the position might change. Consequently, as:

  • the hash value is easily available to you via hash()

  • the position would expose the dictionary's internal state

  • you can "cache" hash values in your objects easily enough in the __hash__ method

there is probably no point in exposing the keys' positions.

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • You could also use the `hash()` built-in function instead of going to the method directly. – zondo Apr 01 '16 at 20:02
  • Thanks, @zondo - I've already updated it, but I appreciate the comment. – Ami Tavory Apr 01 '16 at 20:04
  • @zondo However, it was important for me to mention this method, as perhaps the OP's motivation was to avoid recalculating the hash value, and my point was that it could be cached within the `__hash__` method. – Ami Tavory Apr 01 '16 at 20:05