0

As mentioned in this thread Should I use 'has_key()' or 'in' on Python dicts?, 'key in D.keys()' is said to be O(N), while 'key in D' is only O(1).

I tried the following test

from timeit import default_timer as timer

#test if key in D is O(1) but key in D.keys() is O(N)
repeat=100;  timer_key={}
for N in [1000,  10000,  100000, 2*10**5, 5*10**5, 10**6, 2*10**6, 5*10**6]:
    t_genD = 0;  t_in_D = 0;  t_in_Dkeys = 0;
    for _ in range(repeat):
        start = timer();  D = dict.fromkeys(range(N));  t_genD += timer() -start
        start = timer();  tmp = 1 in D;          t_in_D += timer() -start
        start = timer();  tmp = 1 in D.keys();   t_in_Dkeys += timer() -start
        timer_key[N]=( t_genD/repeat, t_in_D/repeat,  t_in_Dkeys/repeat )
        
    print('N=', N,  timer_key[N])

The result in python3 is something like the following (depends on where you run it). For the shown timing: 1st column is for D generation, 2nd column for key in D, 3rd column for k in D.keys()

N= 10000 (0.0005010669608600438, 3.3366493880748747e-07, 4.834565334022045e-07)
N= 100000 (0.005877389032393694, 4.0351413190364836e-07, 4.942761734127999e-07)
N= 200000 (0.012369387333747, 5.285115912556648e-07, 9.471247904002666e-07)
N= 500000 (0.03357502325205133, 8.423859253525734e-07, 1.718592830002308e-06)
N= 1000000 (0.07786185476696118, 1.4286534860730172e-06, 3.0477484688162803e-06)
N= 2000000 (0.16994113162159918, 1.5859492123126984e-06, 3.214373718947172e-06)
N= 5000000 (0.43450434028636664, 1.2850086204707622e-06, 2.0523485727608205e-06)

Do the timing results make you believe that 'k in D.keys()' vs 'k in D' is O(N) vs O(1), like the one claimed in the thread linked above? In python2 likely the claim is true since D.keys() return a list, while in python3 it appears not to be the case. Someone said 'k in D.keys()' in python3 is like 'k in D.viewkeys()' in python2, which may explain the close performance shown above, I would like to hear about other perspective on the performance comparison. Thanks.

water stone
  • 111
  • 2

1 Answers1

1

In Python 3, dict.keys() no longer produces a list, it produces a "keys view" object, which is a view, backed live by the underlying dict, of the keys, with full hashed lookups. So it's O(1) to do either key in d or key in d.keys(); the latter just spends a little time unnecessarily constructing the view object to perform the lookup on, then checking for membership (which the view object delegates to the underlying dict).

You are correct that d.keys() in Python 3 is equivalent to d.viewkeys() in Python 2; viewkeys was added to aid the transition to Python 3 (the 2to3 converter recognizes it and replaces it with plain keys, while code using d.keys() in Python 2 is usually converted to list(d.keys()) for Python 3).

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271