I have a follow-up question to this one. One of the commenters on the original question mentions that he has seen people in the past mistakenly using syntax like:
key in d.keys()
which finishes in O(n) time, as opposed to
key in d
which finishes in O(1) time, not realizing the difference. Until today (when I stumbled upon the original question after trying to understand why my code was running so slowly), I was one of those people. I attempted to verify the accuracy of the comment using Python 2.7.5, and sure enough, here are the results of timeit:
$ python -m timeit -s 'd=dict.fromkeys(range(100))' '1000 in d.keys()'
100000 loops, best of 3: 2.18 usec per loop
$ python -m timeit -s 'd=dict.fromkeys(range(100))' '1000 in d'
10000000 loops, best of 3: 0.0449 usec per loop
$ python -m timeit -s 'd=dict.fromkeys(range(1000))' '10000 in d.keys()'
100000 loops, best of 3: 17.9 usec per loop
$ python -m timeit -s 'd=dict.fromkeys(range(1000))' '10000 in d'
10000000 loops, best of 3: 0.0453 usec per loop
There is a factor of roughly 50X difference in speed (2.19 usec / 0.0449 usec), for a dictionary with 100 keys, and 400X difference (17.9 usec / 0.0453 usec) for a dictionary with 1000 keys, for searches which are explicitly constructed so that the search key will be too large to be found in the dictionary. So in other words, O(n) vs. O(1), just as the commenter said.
My question: why is it different? The two syntaxes look practically identical! Clearly Python must be doing something very different under the hood between these two cases, but what exactly is going on internally that actually gives rise to the distinction?