I understand hash tables: In principle, you store a key's data in a fixed-size array, with the index/slot to use given by the key's hash. However, Python's dict class has the method dict.keys()
which returns a list of the dict's keys. Where does this list come from? (Also, iterating over a dictionary implicitly iterates over its keys).
I tried to think about this myself and I identified the following requirements:
- insertion in O(1)
- deletion in O(1)
- iteration in O(n)
I thought maybe we could store the index of the next and previous non-empty slots for each slot, so we could jump to the next/prev element in O(1) and also clear a slot in O(1) (just update prev's next index and next's prev index). The problem is that insertion would be in O(log n) there because we would have to binary-search over the 'next' indexes.
Another explanation I considered is that maybe we just iterate over all the slots, and just ignore the empty slots and accept the runtime penalty from repeatedly checking empty slots every time we iterate over the keys. A disadvantage of this is that the hash table would need to be pretty full for this to be efficient, which would slow down insertions.
The related question "How are Python's Built In Dictionaries Implemented" never mentioned this aspect of dicts.
Edit: Source code for iteration seems to be here.