Why is sorting a python list of tuples faster when I explicitly provide the key as the first element?

Question

Sorting a list of tuples (dictionary keys,values pairs where the key is a random string) is faster when I do not explicitly specify that the key should be used (edit: added operator.itemgetter(0) from comment by @Chepner and the key version is now faster!):

import timeit

setup ="""
import random
import string

random.seed('slartibartfast')
d={}
for i in range(1000):
    d[''.join(random.choice(string.ascii_uppercase) for _ in range(16))] = 0
"""
print min(timeit.Timer('for k,v in sorted(d.iteritems()): pass',
        setup=setup).repeat(7, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=lambda x: x[0]): pass',
        setup=setup).repeat(7, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=operator.itemgetter(0)): pass',
        setup=setup).repeat(7, 1000))

Gives:

0.575334150664
0.579534521128
0.523808984422 (the itemgetter version!)

If however I create a custom object passing the key=lambda x: x[0] explicitly to sorted makes it faster:

setup ="""
import random
import string

random.seed('slartibartfast')
d={}

class A(object):
    def __init__(self):
        self.s = ''.join(random.choice(string.ascii_uppercase) for _ in
              range(16))
    def __hash__(self): return hash(self.s)
    def __eq__(self, other):
        return self.s == other.s
    def __ne__(self, other): return self.s != other.s
    # def __cmp__(self, other): return cmp(self.s ,other.s)

for i in range(1000):
    d[A()] = 0
"""
print min(timeit.Timer('for k,v in sorted(d.iteritems()): pass',
        setup=setup).repeat(3, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=lambda x: x[0]): pass',
        setup=setup).repeat(3, 1000))
print min(timeit.Timer('for k,v in sorted(d.iteritems(),key=operator.itemgetter(0)): pass',
        setup=setup).repeat(3, 1000))

Gives:

4.65625458083
1.87191002252
1.78853626684

Is this expected ? Seems like second element of the tuple is used in the second case but shouldn't the keys compare unequal ?

Note: uncommenting the comparison method gives worse results but still the times are at one half:

8.11941771831
5.29207000173
5.25420037046

As expected built in (address comparison) is faster.

EDIT: here are the profiling results from my original code that triggered the question - without the key method:

         12739 function calls in 0.007 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.007    0.007 <string>:1(<module>)
        1    0.000    0.000    0.007    0.007 __init__.py:6527(_refreshOrder)
        1    0.002    0.002    0.006    0.006 {sorted}
     4050    0.003    0.000    0.004    0.000 bolt.py:1040(__cmp__) # here is the custom object
     4050    0.001    0.000    0.001    0.000 {cmp}
     4050    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'sort' of 'list' objects}
      291    0.000    0.000    0.000    0.000 __init__.py:6537(<lambda>)
      291    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 bolt.py:1240(iteritems)
        1    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

and here are the results when I specify the key:

         7027 function calls in 0.004 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.004    0.004 <string>:1(<module>)
        1    0.000    0.000    0.004    0.004 __init__.py:6527(_refreshOrder)
        1    0.001    0.001    0.003    0.003 {sorted}
     2049    0.001    0.000    0.002    0.000 bolt.py:1040(__cmp__)
     2049    0.000    0.000    0.000    0.000 {cmp}
     2049    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'sort' of 'list' objects}
      291    0.000    0.000    0.000    0.000 __init__.py:6538(<lambda>)
      291    0.000    0.000    0.000    0.000 __init__.py:6533(<lambda>)
      291    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 bolt.py:1240(iteritems)
        1    0.000    0.000    0.000    0.000 {method 'iteritems' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Apparently it is the __cmp__ and not the __eq__ method that is called (edit cause that class defines __cmp__ but not __eq__, see here for the order of resolution of equal and compare).

In the code here __eq__ method is indeed called (8605 times) as seen by adding debug prints (see the comments).

So the difference is as stated in the answer by @chepner. The last thing I am not quite clear on is why are those tuple equality calls needed (IOW why we need to call eq and we don't call cmp directly).

FINAL EDIT: I asked this last point here: Why in comparing python tuples of objects is __eq__ and then __cmp__ called? - turns out it's an optimization, tuple's comparison calls __eq__ in the tuple elements, and only call cmp for not eq tuple elements. So this is now perfectly clear. I thought it called directly __cmp__ so initially it seemed to me that specifying the key is just unneeded and after Chepner's answer I was still not getting where the equal calls come in.

Gist: https://gist.github.com/Utumno/f3d25e0fe4bd0f43ceb9178a60181a53

@That1Guy, sorry just deleted the comment by mistake, you are talking about c speed vs python so you will get hammered using the methods above in pure python — Padraic Cunningham, Dec 24 '15 at 17:01
@That1Guy, the main difference here is if you add a `print(self, other )` in `eq` you won't see it being called at all for the lambda solution, for the non lambda it is called 88 times so you have 88 slow python method calls — Padraic Cunningham, Dec 24 '15 at 17:09
Comparing random objects of the same type should result in comparing their addresses (in CPython) so the first elements should compare unequal in both setups - will edit making the keys 16 chars so they always compare unequal in both setups - same results. So it seems the second element would not be needed in the comparison - so why does it make it faster explicitly omitting it in the second case ? @PadraicCunningham: is the `__eq__` call you believe ? Why is it called in the non lambda case ? — Mr_and_Mrs_D, Dec 24 '15 at 17:16
@Mr_and_Mrs_D, in the non-lambda case eq is called, that is what is killing you. — Padraic Cunningham, Dec 24 '15 at 17:23
@Padraic Cunningham: but why is it called ? I would expect to compare first elements of the tuple in both cases (lambda and non lambda) and those first elements compare unequal in both cases - so I would expect same performance. Is tuple comparison the difference and if so why ? So I would imagine the non lambda to be roughly equivalent to `cmp= lambda x, y: x[0] < y[0] or x[1] < y[1]` - where is `eq` here ? — Mr_and_Mrs_D, Dec 24 '15 at 17:33
Your first example seems to show the opposite of what you claim it does. — Adam Smith, Dec 24 '15 at 17:55
@AdamSmith: actually yes - changed when I switched from random strings to random ints. Reverting to the random string version. Random ints version: http://stackoverflow.com/revisions/a362cb41-59f5-46cf-b04d-35157d78111f/view-source — Mr_and_Mrs_D, Dec 24 '15 at 18:03

score 8 · Accepted Answer · answered Dec 24 '15 at 18:03

8

There are two issues at play.

Comparing two values of builtin types (such as int) happens in C. Comparing two values of a class with an __eq__ method happens in Python; repeatedly calling __eq__ imposes a significant performance penalty.
The function passed with key is called once per element, rather than once per comparison. This means that lambda x: x[0] is called once to build a list of A instances to be compared. Without key, you need to make O(n lg n) tuple comparisons, each of which requires a call to A.__eq__ to compare the first element of each tuple.

The first explains why your first pair of results is under a second while the second takes several seconds. The second explains why using key is faster regardless of the values being compared.

answered Dec 24 '15 at 18:03

chepner

497,756
71
530
681

Thanks - why in my edited int version the key version is faster: http://stackoverflow.com/revisions/a362cb41-59f5-46cf-b04d-35157d78111f/view-source while here is actually slower ? Also the x[0] are instances of A and I still need O(nlogn) comparisons to sort the list returned by using the lambda key - is not in those comparisons `__eq__` called ? – Mr_and_Mrs_D Dec 24 '15 at 18:09
1

It's still O(n lg n), but with a smaller constant. (Instead of making O(n lg n) tuple comparisons and O(n lg n) `A` comparisons, you are making O(n) key lookups and O(n lg n) `A` comparisons). – chepner Dec 24 '15 at 18:12
Aha - thanks - so the `eq` is called in the tuple comparison ? Why is then the key example in the question slower in the first case ? – Mr_and_Mrs_D Dec 24 '15 at 18:14
Tuples and ints are both compared in C; there are no additional Python functions being called. Calling the lambda to extract the first integer from the tuple is slower than just comparing the tuples directly. Try it with `key=operator.itemgetter(0)` instead (after importing `operator`, of course). – chepner Dec 24 '15 at 18:16
It is faster in the integers case: http://stackoverflow.com/revisions/a362cb41-59f5-46cf-b04d-35157d78111f/view-source but I guess cause there the tuple eq is completed bypassed (some optimization for integer tuples ?) The last bit that's still not clear for me why we need the extra O(n lg n) `__eq__` in the non key case ? – Mr_and_Mrs_D Dec 25 '15 at 16:26
Ok finally completely got that - the ingredient I was missing is that tuple comparison does not call `__cmp__` directly on the tuple elements - it first calls `__eq__`: http://stackoverflow.com/q/37513671/281545 - now the comments above make sense – Mr_and_Mrs_D May 30 '16 at 17:36

Why is sorting a python list of tuples faster when I explicitly provide the key as the first element?

1 Answers1