Python `Counter` seems slower than `sorted` in comparison task - why?

Question

In reference to this older stack overflow question @Raymond Hettinger gets presumeably correct results where Counter is 4x faster then sorted using timeit from the command line like so:

python3.6 -m timeit -s 'from collections import Counter' -s 'from random import shuffle' -s 't=list(range(100)) * 5' -s 'shuffle(t)' -s 'u=t[:]' -s 'shuffle(u)' 'Counter(t)==Counter(u)'

My results indicate that sorted is significantly faster then Counter! Am I using timeit incorrectly? Interpreting the results wrong? Is the setup data somehow producing different results?

import timeit, functools
from collections import Counter


def sorted_lists(l1,l2):
    return sorted(l1) == sorted(l2)


def counted_lists(l1,l2):
    return Counter(l1) == Counter(l2)


short1 = [0,1,2,3,4,5,5]
short2 = [0,1,5,3,4,5,2]
long1 = list(range(0, 1000)) + [100, 10, 1000, 5]
long2 = list(range(0, 1000)) + [5, 10, 100, 1000]

number = 1000

# Long test
t = timeit.Timer(lambda: sorted_lists(long1, long2))
rl1 = t.timeit(number)
print("sorted long  :{}".format(rl1))

t = timeit.Timer(lambda: counted_lists(long1, long2))
rl2 = t.timeit(number)
print("counted long :{}".format(rl2)


# Short test
t = timeit.Timer(functools.partial(sorted_lists, short1, short2))
rs1 = t.timeit(number)
print("sorted short :{}".format(rs1))

t = timeit.Timer(functools.partial(counted_lists, short1, short2))
rs2 = t.timeit(number)
print("counted short:{}".format(rs2)

The output is fairly consistent:

sorted long  :0.04470205499092117 # less time = fastest
counted long :0.1182843999704346

sorted short :0.0012896459666080773 # less time = fastest
counted short:0.009829471004195511

Both sets of tests were run in python 3.6.

Different machines using different version of software yield different results. That answer is from 2011, meaning you are almost definitely running a newer version of python than they were, maybe you're running python 3 vs their python 2. — Shadow, Oct 21 '16 at 04:04
We are both testing on python 3.6 as per the recent comments on @Raymond Hettinger's answer to the old question. Also, times will vary based on hardware, but not to this extent, where a builtin function will be significantly faster then another on one machine and significantly slower on the other. — Arctelix, Oct 21 '16 at 04:41
Your inputs are small. A factor of log(n) doesn't mean much at this scale; constant factors can easily dominate the performance comparison. — user2357112, Oct 21 '16 at 04:42
Also, your longer inputs are nearly sorted, and `sorted` knows how to take advantage of that. It has a lot less work to perform than if the inputs were shuffled. — user2357112, Oct 21 '16 at 04:44
@user2357112 Yup, that was it. Sorted is really good at sorting things that are almost sorted or less then 50 items long! — Arctelix, Oct 21 '16 at 05:04

score 0 · Answer 1 · answered Oct 21 '16 at 05:24

Thanks to @user2357112's comments above, if the input data is modified as follows:

short1 = list(range(10)) * 5
short2 = list(range(10)) * 5
long1 = list(range(100)) * 5
long2 = list(range(100)) * 5

shuffle(short1)
shuffle(short2)
shuffle(long1)
shuffle(long2)

results:

sorted long  :0.055621962004806846
counted long :0.04698559001553804

sorted short :0.008079404011368752
counted short:0.014304430980701

The results are approaching Raymonds's from the old question.

A few more tests revealed the following:

sorted seems to be faster at long nearly sorted lists or any list under 50 items

Count is significantly faster for totally randomized lists over 1000+ items long.

nice comparision. The result is not surprising though since `sorted` _sorts_ the list which it cannot do faster than in `O(n log n)` time, in fact no sort algorithm can as that's the theoretical [lower bound](https://en.wikipedia.org/wiki/Comparison_sort#Number_of_comparisons_required_to_sort_a_list). `Counter` on the other hand does not sort, it only counts which it can do by passing through the list once taking `O(n)` time. Hence for any _sufficiently large_ `n`, `Counter` by theory and as you show nicely in practice is expected to be faster for this particular task. — miraculixx, Oct 21 '16 at 05:34

Python `Counter` seems slower than `sorted` in comparison task - why?

1 Answers1