2

"One semi-gotcha to avoid though is to make sure you do: key in some_dict rather than key in some_dict.keys(). Both are equivalent semantically, but performance-wise the latter is much slower (O(n) vs O(1)). I've seen people do the in dict.keys() thinking it's more explicit & therefore better."

I found this piece of advice online. Can anyone please explain and justify the above difference in performance? How is the working of these two seemingly similar statements so different?

EDIT: To be more precise, how is indexing in a dictionary faster than indexing in a list? As far as I've learned, hash tables are arrays of linked lists. The array being an array of the keys. So finding a key in a hash table should be similar to finding that key in a list of keys. (?)

  • Are you familiar with the theory of [hash tables](https://en.wikipedia.org/wiki/Hash_table)? – PM 2Ring Feb 04 '17 at 11:56
  • @PM2Ring I have indeed implemented it in C. So I'd say yeah, I am familiar with the data structure. Am I missing something obvious? – Pranjal Verma Feb 04 '17 at 11:58
  • Another similar question http://stackoverflow.com/questions/1602934/check-if-a-given-key-already-exists-in-a-dictionary – Mazdak Feb 04 '17 at 12:00
  • @Kasramvd That question is certainly worth linking, but sadly none of those answers explain _why_ `key in some_dict.keys()` is bad, although it's mentioned briefly in [this comment](http://stackoverflow.com/questions/1602934/check-if-a-given-key-already-exists-in-a-dictionary#comment1468149_1602945). – PM 2Ring Feb 04 '17 at 12:02
  • @PM2Ring Yes, and that's why I linked the current question as it's about the best way for *key lookup* in a dictionary. – Mazdak Feb 04 '17 at 12:04
  • @Kasramvd Yeah that question does seem similar but that one comment still just rephrases the same quote from my question. I still don't have an answer tho. :\ – Pranjal Verma Feb 04 '17 at 12:20
  • @PranjalVerma Do you understand Leon's answer? – PM 2Ring Feb 04 '17 at 12:23
  • To answer to your question after edit I'd reference you to [THIS](http://stackoverflow.com/questions/42040020/difference-in-performance-of-two-seemingly-similar-dict-statements?noredirect=1#comment71253574_42040020) comment – Mazdak Feb 04 '17 at 12:25
  • @PM2Ring just read it, I think I get it now. I don't understand his answer word to word but I think I get the idea. – Pranjal Verma Feb 04 '17 at 12:27
  • 1
    You should read this article by Laurent Luce about the [Python dictionary implementation](http://www.laurentluce.com/posts/python-dictionary-implementation). The exact implementation of the `dict` object has changed in Python 3.6, but the hashing process is still very similar, AFAIK. And since you can read & write C you may find it helpful to look at the [dictobject.c](https://github.com/python/cpython/blob/master/Objects/dictobject.c) source code. – PM 2Ring Feb 04 '17 at 12:54
  • 1
    [This answer](http://stackoverflow.com/a/39980744/4014959) by Jim Fasarakis-Hilliard briefly explains the new Python 3.6 dictionary. – PM 2Ring Feb 04 '17 at 13:01
  • Oh wow, thanks mate! This is amazing! – Pranjal Verma Feb 04 '17 at 13:53

1 Answers1

3

It is true only for Python 2.

In Python 3, dict.keys() returns a view object dict_keys that wraps the source dict object:

$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
>>> d = { 1: 11, 2:22, 3:33 }
>>> k = d.keys()
>>> k
dict_keys([1, 2, 3])
>>> d
{1: 11, 2: 22, 3: 33}
>>> d[4] = 44
>>> k
dict_keys([1, 2, 3, 4])  #!!! k includes the new key that was added to d
>>> 

As a result, in Python 3, key in dict.keys() is effectively executed almost as key in dict:

  1. dict.keys() creates the dict_keys object in O(1) time and then
  2. the query operation is rerouted through dict_keys back to dict which performs it in O(1) time.

Unlike Python 3, in Python 2, dict.keys() returns a list object which has to be constructed in O(n) time:

$ python2
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
>>> d = { 1: 11, 2:22, 3:33 }
>>> k = d.keys()
>>> k
[1, 2, 3]
>>> d[4] = 44
>>> k
[1, 2, 3]
>>> 

Therefore, in Python 2, key in dict.keys() (as a test, rather than as a part of for key in dict.keys()) will have two sources of O(n) time complexity:

  1. Building the list returned by dict.keys() takes O(n) time
  2. Checking if the query value is in the returned list takes another O(n) time.
Leon
  • 31,443
  • 4
  • 72
  • 97
  • Not only is the list construction time O(n), the list search time is also O(n), since `k in some_list` performs a linear scan. Your answer really should mention that. Both operations are performed at C speed, so they're much faster than doing it with explicit Python loops, but it's certainly wise to avoid those unnecessary operations. – PM 2Ring Feb 04 '17 at 12:06
  • @PM2Ring You are right. I somehow misread the question assuming that *`element`* **`in`** *`set`* is used in the context of a `for` loop. – Leon Feb 04 '17 at 12:14
  • The Python 3 `dict.keys()` is a [view object](https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects), which is set-like, although it's not exactly a set. The Python 2 `dict.keys()` is a plain old static list. – PM 2Ring Feb 04 '17 at 12:20
  • I don't know why you got a down-vote. Maybe the down-voter doesn't understand your answer... – PM 2Ring Feb 04 '17 at 12:22
  • @Leon oh okay! a 'view object'. That's completely new for me. I'll have to look into it. Thanks. :) – Pranjal Verma Feb 04 '17 at 12:28
  • @PranjalVerma The `dict_keys` view object only helps to make the execution of `key in dict.keys()` almost as fast as the execution of `key in dict`. The speed-up is relative to the case where `dict.keys()` returns a plain Python list. In order to understand why `key in dict` takes O(1) time (both in Python 2 and 3), you must understand how hash-tables work. – Leon Feb 04 '17 at 13:27