Performance comparison of using `in` vs try except in python

Question

I have a dictionary in some code which maps a key to a word, the key is the result of an md5 hash. I have code that essentially wants to get the key for a word, and when it doesn't already exist, add it to the dictionary

Here was my first implementation:

key = int(hashlib.md5(word).hexdigest(), 16)
if key in self.id_to_word.keys():
    assert word == self.id_to_word[key]
else:
    self.id_to_word[key] = word
return key

After profiling my code I found this to be EXTREMELY slow. So then I tried this, which is functionally equivalent

key = int(hashlib.md5(word).hexdigest(), 16)

try:
    assert word == self.id_to_word[key]
    return key
except KeyError:
    self.id_to_word[key] = word

This turned out to be incredibly faster. While I'm certainly happy about the performance improvement, I was wondering if someone could explain to me why. Is it bad practice to check for something in a keys() function from a dictionary like that? Is it generating copies of that every time (wasting a lot of computation)?

.keys() creates a list of all the keys, and the `in` statement searches all those keys, an O(N) operation. `key in dict` is O(1) operation — reptilicus, Apr 08 '15 at 19:05
This discussion doesn't work for you: http://stackoverflow.com/q/1835756/349420 ? — Esparta Palma, Apr 08 '15 at 19:08

score 3 · Accepted Answer · answered Apr 08 '15 at 19:06

3

id_to_word.keys() creates a new list, which is linearly searched, which is much slower than a hash lookup. Remove the .keys().

The fastest way would be:

key = int(hashlib.md5(word).hexdigest(), 16)
assert word == self.id_to_word.setdefault(key, word)

answered Apr 08 '15 at 19:06

Daniel

42,087
4
55
81

The only issue with this is that `assert` should not be used in production code. `python -O` (the optimize flag) will remove an assert from code. At that time, your code breaks. – Deacon Apr 13 '15 at 18:08

Bakuriu · Answer 2 · 2015-04-08T19:11:52.600

This is to be expected (in python2). The keys() method returns a list of keys. So using the in operator on the list takes linear time.

Trying to access the item is constant time, which is much faster.

Note: you can simply use key in dictionary instead of the try: ...except:.

Note that dictionaries have a setdefault method that already does what you want. Moreover if you do that operation a lot of time you should consider using collections.defaultdict instead of a plain dictionary.

score 2 · Answer 3 · answered Apr 08 '15 at 19:06

key in some_dict is much faster than key in some_dict.keys()

Dict Lookup key in some_dict is O(1) complexity so its very fast

that said its still (very marginally)slower in the case where the key is in the dict than just try/except

the real answer is there is no real measurable difference between these 2 methods and do whatever feels right to you

reptilicus · Answer 4 · 2015-04-08T20:56:14.617

1

Elaborating on my comments above:

In [4]: d = {k:k for k in xrange(1000)}
In [5]: %timeit 50 in d
10000000 loops, best of 3: 68.6 ns per loop

In [6]: %timeit 50 in d.keys()
100000 loops, best of 3: 6.35 µs per loop

as you can see, using d.keys() is about 100 times slower.

edited Apr 08 '15 at 20:56

answered Apr 08 '15 at 19:09

reptilicus

10,290
6
55
79

score 0 · Answer 5 · answered Apr 08 '15 at 19:40

As mentioned setdefault() solves your problem without an if or try block.
But "Easier to Ask for Forgiveness than Permission" [EAFP] and duck-typing is a common idiom in Python, compared to the more defensive "Look Before You Leap" [LBYL] idiom common in other languages, e.g. Java, C++.

Jeff Knuth has an interesting blog post on it Write Clean Python: Use Exceptions

Performance comparison of using `in` vs try except in python

5 Answers5