2

I have a follow-up question to this one. One of the commenters on the original question mentions that he has seen people in the past mistakenly using syntax like:

key in d.keys()

which finishes in O(n) time, as opposed to

key in d

which finishes in O(1) time, not realizing the difference. Until today (when I stumbled upon the original question after trying to understand why my code was running so slowly), I was one of those people. I attempted to verify the accuracy of the comment using Python 2.7.5, and sure enough, here are the results of timeit:

$ python -m timeit -s 'd=dict.fromkeys(range(100))' '1000 in d.keys()'
100000 loops, best of 3: 2.18 usec per loop
$ python -m timeit -s 'd=dict.fromkeys(range(100))' '1000 in d'
10000000 loops, best of 3: 0.0449 usec per loop
$ python -m timeit -s 'd=dict.fromkeys(range(1000))' '10000 in d.keys()'
100000 loops, best of 3: 17.9 usec per loop
$ python -m timeit -s 'd=dict.fromkeys(range(1000))' '10000 in d'
10000000 loops, best of 3: 0.0453 usec per loop    

There is a factor of roughly 50X difference in speed (2.19 usec / 0.0449 usec), for a dictionary with 100 keys, and 400X difference (17.9 usec / 0.0453 usec) for a dictionary with 1000 keys, for searches which are explicitly constructed so that the search key will be too large to be found in the dictionary. So in other words, O(n) vs. O(1), just as the commenter said.

My question: why is it different? The two syntaxes look practically identical! Clearly Python must be doing something very different under the hood between these two cases, but what exactly is going on internally that actually gives rise to the distinction?

Community
  • 1
  • 1
stachyra
  • 4,423
  • 4
  • 20
  • 34

2 Answers2

8

d.keys() returns a list which is a copy of the dict keys, not a view. Constructing that list takes O(n), as does the lookup, which uses list.__contains__ i.e. iterating the keys.

On the other hand, key in d essentially calls

d.__contains__(key)

The method dict.__contains__ is implemented efficiently with an O(1) hash lookup on the dict keys. This is precisely the raison d'être for the dict data structure, and it is the same reason you get a fast O(1) lookup when you access a dict with d[key].

In summary, key in d.keys() is never appropriate in python 2.

Note: in python3 this has changed, where using key in d.keys() would be more or less equivalent to key in d.viewkeys() for python 2.

wim
  • 338,267
  • 99
  • 616
  • 750
0

A dictionary uses a hash of key to index into the keys. If the index exists, you only have to do one comparison; that's O(1). If you use d.keys() you first get a copy of all keys as a list, and afterwards you have to compare each element of this list with your key, that's why it is O(n).

Daniel
  • 42,087
  • 4
  • 55
  • 81