From what I understand, since count()
iterates over the list again for each element it is slower. Since, the Counter and dict construction both iterate over the list once
shouldn't the dict construction and counter time results be similar?
I used the https://stackoverflow.com/a/23909767/7125235 as code reference for getting the time values.
import timeit
if __name__ == "__main__":
code_block = """seen = {}
for i in l:
if seen.get(i):
seen[i] += 1
else:
seen[i] = 1
"""
setup = """import random
import string
from collections import Counter
n=1000
l=[random.choice(string.ascii_letters) for x in range(n)]
"""
t1 = timeit.Timer(
stmt="Counter(l)",
setup=setup,
)
t2 = timeit.Timer(
stmt="[[x,l.count(x)] for x in set(l)]",
setup=setup,
)
t3 = timeit.Timer(
stmt=code_block,
setup=setup,
)
print("Counter(): ", t1.repeat(repeat=3, number=10000))
print("count(): ", t2.repeat(repeat=3, number=10000))
print("seen{}: ", t3.repeat(repeat=3, number=10000))
Output:
Run1:
Counter(): [0.32974308, 0.319977907, 0.301750341]
count(): [6.424047524000001, 6.417152854, 6.450776530000001]
seen{}: [1.1089669810000018, 1.099655232, 1.116015376]
Run 2:
Counter(): [0.322483783, 0.32464020800000004, 0.33498838900000005]
count(): [6.3235339029999995, 6.48233445, 6.543396192000001]
seen{}: [1.1192663550000006, 1.1072084830000009, 1.1155270229999985]