As mentioned in this thread Should I use 'has_key()' or 'in' on Python dicts?, 'key in D.keys()' is said to be O(N), while 'key in D' is only O(1).
I tried the following test
from timeit import default_timer as timer
#test if key in D is O(1) but key in D.keys() is O(N)
repeat=100; timer_key={}
for N in [1000, 10000, 100000, 2*10**5, 5*10**5, 10**6, 2*10**6, 5*10**6]:
t_genD = 0; t_in_D = 0; t_in_Dkeys = 0;
for _ in range(repeat):
start = timer(); D = dict.fromkeys(range(N)); t_genD += timer() -start
start = timer(); tmp = 1 in D; t_in_D += timer() -start
start = timer(); tmp = 1 in D.keys(); t_in_Dkeys += timer() -start
timer_key[N]=( t_genD/repeat, t_in_D/repeat, t_in_Dkeys/repeat )
print('N=', N, timer_key[N])
The result in python3 is something like the following (depends on where you run it). For the shown timing: 1st column is for D generation, 2nd column for key in D, 3rd column for k in D.keys()
N= 10000 (0.0005010669608600438, 3.3366493880748747e-07, 4.834565334022045e-07)
N= 100000 (0.005877389032393694, 4.0351413190364836e-07, 4.942761734127999e-07)
N= 200000 (0.012369387333747, 5.285115912556648e-07, 9.471247904002666e-07)
N= 500000 (0.03357502325205133, 8.423859253525734e-07, 1.718592830002308e-06)
N= 1000000 (0.07786185476696118, 1.4286534860730172e-06, 3.0477484688162803e-06)
N= 2000000 (0.16994113162159918, 1.5859492123126984e-06, 3.214373718947172e-06)
N= 5000000 (0.43450434028636664, 1.2850086204707622e-06, 2.0523485727608205e-06)
Do the timing results make you believe that 'k in D.keys()' vs 'k in D' is O(N) vs O(1), like the one claimed in the thread linked above? In python2 likely the claim is true since D.keys() return a list, while in python3 it appears not to be the case. Someone said 'k in D.keys()' in python3 is like 'k in D.viewkeys()' in python2, which may explain the close performance shown above, I would like to hear about other perspective on the performance comparison. Thanks.