7

Possible Duplicate:
'has_key()' or 'in'?

In Python, there're two ways of deciding whether a key is in a dict:

if dict.has_key(key) and if key in dict

Someone tells me that the second one is slower than the first one since the in keyword makes the expression an iteration over the dict, so it will be slower than the has_key alternative, which apparently uses hash to make the decision.

As I highly doubt the difference, since I think Python is smart enough to translate an in keyword before a dict to some hash way, I can't find any formal claim about this.

So is there really any efficiency difference between the two?

Thanks.

Community
  • 1
  • 1
Derrick Zhang
  • 21,201
  • 18
  • 53
  • 73

3 Answers3

9

Both of these operations do the same thing: examine the hash table implemented in the dict for the key. Neither will iterate the entire dictionary. Keep in mind that for x in dict is different than if x in dict. They both use the in keyword, but are different operations.

The in keyword becomes a call on dict.__contains__, which dict can implement however it likes.

If there is a difference in the timings of these operations, it will be very small, and will have to do with the function call overhead of has_key.

BTW, the general preference is for key in dict as a clearer expression of the intent than dict.has_key(key). Note that speed has nothing to do with the preference. Readability is more important than speed unless you know you are in the critical path.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • 2
    ....all of this, and in addition, 'has_key()' is deprecated and should no longer be used. :) – jonesy Jul 09 '12 at 02:28
3

D.has_key is actually slower due to the function call:

>>> D = dict((x, y) for x, y in zip(range(1000000), range(1000000)))
>>> from timeit import Timer
>>> t = Timer("1700 in D", "from __main__ import D")
>>> t.timeit()
0.10631704330444336
>>> t = Timer("D.has_key(1700)", "from __main__ import D")
>>> t.timeit()
0.18113207817077637
jterrace
  • 64,866
  • 22
  • 157
  • 202
  • Well considering that timeit runs the statement a million times, and the number is is in seconds, the *absolute* time difference is pretty small – jterrace Jul 09 '12 at 03:42
  • Something like 80 nanoseconds.. – jterrace Jul 09 '12 at 03:44
  • It's nearly a factor of two difference, though. You make a good point (implicitly) about focusing optimization efforts in meaningful places, but still. – Karl Knechtel Jul 09 '12 at 05:22
3

has_key isn't an alternative. It's deprecated. Don't use it. (It's slower anyhow)

John La Rooy
  • 295,403
  • 53
  • 369
  • 502