Counter
has one major bottleneck when you count short iterables: It checks if isinstance(iterable, Mapping)
. This test is rather slow because collections.abc.Mapping
is an abstract metaclass and as such the isinstance
-check is a bit more complicated than normal isinstance
-checks, see for example: "Why is checking isinstance(something, Mapping) so slow?"
So it isn't really surprising if other approaches are faster for short iterables. However for long iterables the check doesn't matter much and Counter
should be faster (at least for python-3.x (CPython) where the actual counting function _count_elements
is written in c).
An easy way to identify bottlenecks is profiling. I'll use line_profiler here:
%load_ext line_profiler
from collections import Counter
x = range(50)
# Profile the function Counter.update when executing the command "Counter(x)"
%lprun -f Counter.update Counter(x)
The result:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
604 1 8 8.0 3.9 if not args:
605 raise TypeError("descriptor 'update' of 'Counter' object "
606 "needs an argument")
607 1 13 13.0 6.4 self, *args = args
608 1 6 6.0 3.0 if len(args) > 1:
609 raise TypeError('expected at most 1 arguments, got %d' % len(args))
610 1 5 5.0 2.5 iterable = args[0] if args else None
611 1 3 3.0 1.5 if iterable is not None:
612 1 94 94.0 46.3 if isinstance(iterable, Mapping):
613 if self:
614 self_get = self.get
615 for elem, count in iterable.items():
616 self[elem] = count + self_get(elem, 0)
617 else:
618 super(Counter, self).update(iterable) # fast path when counter is empty
619 else:
620 1 69 69.0 34.0 _count_elements(self, iterable)
621 1 5 5.0 2.5 if kwds:
622 self.update(kwds)
So the time it takes to initialize a Counter
from an iterable has a rather big constant factor (46% of the time are spent on the isinstance
check, getting the dictionary with the counts takes only 34%).
However for long iterables it doesn't matter (much) because it's just done once:
%lprun -f Counter.update Counter([1]*100000)
Line # Hits Time Per Hit % Time Line Contents
==============================================================
604 1 12 12.0 0.0 if not args:
605 raise TypeError("descriptor 'update' of 'Counter' object "
606 "needs an argument")
607 1 12 12.0 0.0 self, *args = args
608 1 6 6.0 0.0 if len(args) > 1:
609 raise TypeError('expected at most 1 arguments, got %d' % len(args))
610 1 6 6.0 0.0 iterable = args[0] if args else None
611 1 3 3.0 0.0 if iterable is not None:
612 1 97 97.0 0.3 if isinstance(iterable, Mapping):
613 if self:
614 self_get = self.get
615 for elem, count in iterable.items():
616 self[elem] = count + self_get(elem, 0)
617 else:
618 super(Counter, self).update(iterable) # fast path when counter is empty
619 else:
620 1 28114 28114.0 99.5 _count_elements(self, iterable)
621 1 13 13.0 0.0 if kwds:
622 self.update(kwds)
Just to give you an overview how these perform depending on the number of elements, for comparison I included an optimized version of your count and the _count_elements
function that is used by Counter
. However I excluded the part where you sorted
the items and created a list of the counts to avoid other effects - especially because sorted
has a different run-time behavior (O(n log(n))
) than counting (O(n)
):
# Setup
import random
from collections import Counter, _count_elements
def count(iterable):
"""Explicit iteration over items."""
dctCounter = {}
for item in iterable:
if item in dctCounter:
dctCounter[item] += 1
else:
dctCounter[item] = 1
return dctCounter
def count2(iterable):
"""Iterating over the indices"""
dctCounter = {}
lenLstItems = len(iterable)
for idx in range(lenLstItems):
item = iterable[idx]
if item in dctCounter.keys():
dctCounter[item] += 1
else:
dctCounter[item] = 1
return dctCounter
def c_count(iterable):
"""Internal counting function that's used by Counter"""
d = {}
_count_elements(d, iterable)
return d
# Timing
timings = {Counter: [], count: [], count2: [], c_count: []}
for i in range(1, 20):
print(2**i)
it = [random.randint(0, 2**i) for _ in range(2**i)]
for func in (Counter, count, count2, c_count):
res = %timeit -o func(it)
timings[func].append(res)
# Plotting
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(1)
ax = plt.subplot(111)
n = 2**np.arange(1, 5)
ax.plot(n,
[time.average for time in timings[count]],
label='my custom function', c='red')
ax.plot(n,
[time.average for time in timings[count2]],
label='your custom function', c='green')
ax.plot(n,
[time.average for time in timings[Counter]],
label='Counter', c='blue')
ax.plot(n,
[time.average for time in timings[c_count]],
label='_count_elements', c='purple')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('elements')
ax.set_ylabel('time to count them [seconds]')
ax.grid(which='both')
ax.legend()
plt.tight_layout()
And the result:

Individual timings:
2
30.5 µs ± 177 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1.67 µs ± 3.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
6.03 µs ± 31.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.67 µs ± 1.45 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
4
30.7 µs ± 75.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2.63 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
7.81 µs ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1.97 µs ± 5.59 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
8
34.3 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
4.3 µs ± 16.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
11.3 µs ± 23.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3 µs ± 25.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
16
34.2 µs ± 599 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7.46 µs ± 42 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
17.5 µs ± 83.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
4.24 µs ± 17.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
32
38.4 µs ± 578 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
13.7 µs ± 95.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
29.8 µs ± 383 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7.56 µs ± 90.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
64
43.5 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
24 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
52.8 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
11.6 µs ± 83.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
128
53.5 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
47.8 µs ± 507 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
101 µs ± 3.25 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
21.7 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
256
69.6 µs ± 239 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
92.1 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
188 µs ± 3.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
39.5 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
512
123 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
200 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
409 µs ± 3.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
90.9 µs ± 958 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1024
230 µs ± 2.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
428 µs ± 5.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
855 µs ± 14.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
193 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2048
436 µs ± 7.11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
868 µs ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.76 ms ± 31.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
386 µs ± 1.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
4096
830 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.8 ms ± 33.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.75 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.06 ms ± 89.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
8192
2.3 ms ± 246 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.8 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.8 ms ± 425 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.69 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
16384
4.53 ms ± 349 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.22 ms ± 359 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
15.9 ms ± 371 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.9 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
32768
9.6 ms ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
17.2 ms ± 51.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
32.5 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
10.4 ms ± 687 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
65536
24.8 ms ± 490 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
40.1 ms ± 710 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
66.8 ms ± 271 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
24.5 ms ± 659 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
131072
54.6 ms ± 756 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
84.2 ms ± 728 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
142 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
54.1 ms ± 424 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
262144
120 ms ± 4.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
182 ms ± 4.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
296 ms ± 3.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
117 ms ± 320 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
524288
244 ms ± 3.31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
368 ms ± 2.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
601 ms ± 13.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
252 ms ± 6.13 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)