0

In reference to this older stack overflow question @Raymond Hettinger gets presumeably correct results where Counter is 4x faster then sorted using timeit from the command line like so:

python3.6 -m timeit -s 'from collections import Counter' -s 'from random import shuffle' -s 't=list(range(100)) * 5' -s 'shuffle(t)' -s 'u=t[:]' -s 'shuffle(u)' 'Counter(t)==Counter(u)'

My results indicate that sorted is significantly faster then Counter! Am I using timeit incorrectly? Interpreting the results wrong? Is the setup data somehow producing different results?

import timeit, functools
from collections import Counter


def sorted_lists(l1,l2):
    return sorted(l1) == sorted(l2)


def counted_lists(l1,l2):
    return Counter(l1) == Counter(l2)


short1 = [0,1,2,3,4,5,5]
short2 = [0,1,5,3,4,5,2]
long1 = list(range(0, 1000)) + [100, 10, 1000, 5]
long2 = list(range(0, 1000)) + [5, 10, 100, 1000]

number = 1000

# Long test
t = timeit.Timer(lambda: sorted_lists(long1, long2))
rl1 = t.timeit(number)
print("sorted long  :{}".format(rl1))

t = timeit.Timer(lambda: counted_lists(long1, long2))
rl2 = t.timeit(number)
print("counted long :{}".format(rl2)


# Short test
t = timeit.Timer(functools.partial(sorted_lists, short1, short2))
rs1 = t.timeit(number)
print("sorted short :{}".format(rs1))

t = timeit.Timer(functools.partial(counted_lists, short1, short2))
rs2 = t.timeit(number)
print("counted short:{}".format(rs2)

The output is fairly consistent:

sorted long  :0.04470205499092117 # less time = fastest
counted long :0.1182843999704346

sorted short :0.0012896459666080773 # less time = fastest
counted short:0.009829471004195511

Both sets of tests were run in python 3.6.

Community
  • 1
  • 1
Arctelix
  • 4,478
  • 3
  • 27
  • 38
  • Different machines using different version of software yield different results. That answer is from 2011, meaning you are almost definitely running a newer version of python than they were, maybe you're running python 3 vs their python 2. – Shadow Oct 21 '16 at 04:04
  • We are both testing on python 3.6 as per the recent comments on @Raymond Hettinger's answer to the old question. Also, times will vary based on hardware, but not to this extent, where a builtin function will be significantly faster then another on one machine and significantly slower on the other. – Arctelix Oct 21 '16 at 04:41
  • 1
    Your inputs are small. A factor of log(n) doesn't mean much at this scale; constant factors can easily dominate the performance comparison. – user2357112 Oct 21 '16 at 04:42
  • 1
    Also, your longer inputs are nearly sorted, and `sorted` knows how to take advantage of that. It has a lot less work to perform than if the inputs were shuffled. – user2357112 Oct 21 '16 at 04:44
  • @user2357112 Yup, that was it. Sorted is really good at sorting things that are almost sorted or less then 50 items long! – Arctelix Oct 21 '16 at 05:04

1 Answers1

0

Thanks to @user2357112's comments above, if the input data is modified as follows:

short1 = list(range(10)) * 5
short2 = list(range(10)) * 5
long1 = list(range(100)) * 5
long2 = list(range(100)) * 5

shuffle(short1)
shuffle(short2)
shuffle(long1)
shuffle(long2)

results:

sorted long  :0.055621962004806846
counted long :0.04698559001553804

sorted short :0.008079404011368752
counted short:0.014304430980701

The results are approaching Raymonds's from the old question.

A few more tests revealed the following:

sorted seems to be faster at long nearly sorted lists or any list under 50 items

Count is significantly faster for totally randomized lists over 1000+ items long.

Arctelix
  • 4,478
  • 3
  • 27
  • 38
  • nice comparision. The result is not surprising though since `sorted` _sorts_ the list which it cannot do faster than in `O(n log n)` time, in fact no sort algorithm can as that's the theoretical [lower bound](https://en.wikipedia.org/wiki/Comparison_sort#Number_of_comparisons_required_to_sort_a_list). `Counter` on the other hand does not sort, it only counts which it can do by passing through the list once taking `O(n)` time. Hence for any _sufficiently large_ `n`, `Counter` by theory and as you show nicely in practice is expected to be faster for this particular task. – miraculixx Oct 21 '16 at 05:34