5

For example:

d = {"John": "Doe", "Paul": "Allen", "Bill": "Gates"}

Only imagine this had several thousand / million such names, all unique by first name.

If I wanted to see if the key "Paul" existed, what does it do under the hood?

AureliusPhi
  • 433
  • 2
  • 12

2 Answers2

6

Python's dictionary implementation reduces the average complexity of dictionary lookups to O(1) by requiring that key objects provide a "hash" function. Such a hash function takes the information in a key object and uses it to produce an integer, called a hash value. This hash value is then used to determine which "bucket" this (key, value) pair should be placed into. Pseudocode for this lookup function might look something like:

def lookup(d, key):
    '''dictionary lookup is done in three steps:
       1. A hash value of the key is computed using a hash function.

       2. The hash value addresses a location in d.data which is
          supposed to be an array of "buckets" or "collision lists"
          which contain the (key,value) pairs.

       3. The collision list addressed by the hash value is searched
         sequentially until a pair is found with pair[0] == key. The
         return value of the lookup is then pair[1].
   '''
   h = hash(key)                  # step 1
   cl = d.data[h]                 # step 2
   for pair in cl:                # step 3
       if key == pair[0]:
           return pair[1]
   else:
       raise KeyError, "Key %s not found." % key

From the Python Wiki

JNevens
  • 11,202
  • 9
  • 46
  • 72
  • That depends on the size of the hash table relative to the size of the dictionary, e.g. the table can have 5 entries, but you have 10 keys, this will surely lead to a collision (so bins with more than 1 value in it). It might be possible that the hash table can grow dynamically, I don't know the details. It also depends on how good the hashing algorithm is. Again, I'm not familiar with the details of the implementation in Python. – JNevens Nov 17 '14 at 19:42
  • 1
    Python hash tables do grow dynamically (similar to Python arrays) and they tolerate collisions by checking equality on a hit and, on a mismatch, using the next available space in a certain order that encourages equal distribution. – Andrew Gorcester Nov 20 '14 at 18:32
2

Python dictionaries are implemented with a hash table. So O(1) lookup on average (depending on the strength of your hash function).

References:

Community
  • 1
  • 1
Bill Lynch
  • 80,138
  • 16
  • 128
  • 173
  • 1
    O(1) on average, not worst case. – shx2 Nov 17 '14 at 19:23
  • @shx2 generally if you're doing enough lookups for it to matter, average case is what dominates. Unless you have the possibility of a denial-of-service attack. – Mark Ransom Nov 17 '14 at 19:25
  • @MarkRansom: Except the worst case also can assume a terrible hashing algorithm. – Bill Lynch Nov 17 '14 at 19:26
  • 1
    @BillLynch: the hashing is not the issue, collisions are. Python added a random seed to dictionaries to defend against a class of DOS attacks that would exploit the worst case scenario by creating string values specifically crafted to always collide. Send in 10000 of those to tie up web servers, for example. – Martijn Pieters Nov 17 '14 at 19:30
  • @BillLynch you mean a hashing algorithm similar to this random number generator? http://dilbert.com/strips/comic/2001-10-25/ – Mark Ransom Nov 17 '14 at 19:33
  • Yes, because when all the keys generate the same hash, they all reside in the same bucket. So to find a certain key-value-pair, you have to check all the pairs in that bucket, so it's again linear search. – JNevens Nov 17 '14 at 19:47