Summary
Sorting in Python is guaranteed to be stable since Python 2.2, as documented here and here.
Wikipedia explains what the property of being stable means for the behavior of the algorithm:
A sorting algorithm is stable if whenever there are two records R and S with the same key, and R appears before S in the original list, then R will always appear before S in the sorted list.
However, when sorting objects, such as tuples, sorting appears to be unstable.
For example,
>>> a = [(1, 3), (3, 2), (2, 4), (1, 2)]
>>> sorted(a)
[(1, 2), (1, 3), (2, 4), (3, 2)]
However, to be considered stable, I thought the new sequence should've been
[(1, 3), (1, 2), (2, 4), (3, 2)]
because, in the original sequence, the tuple (1, 3)
appears before tuple (1, 2)
. The sorted
function is relying on the 2-ary "keys" when the 1-ary "keys" are equal. (To clarify, the 1-ary key of some tuple t
would be t[0]
and the 2-ary t[1]
.)
To produce the expected result, we have to do the following:
>>> sorted(a, key=lambda t: t[0])
[(1, 3), (1, 2), (2, 4), (3, 2)]
I'm guessing there's a false assumption on my part, either about sorted
or maybe on how tuple and/or list types are treated during comparison.
Questions
- Why is the
sorted
function said to be "stable" even though it alters the original sequence in this manner? - Wouldn't setting the default behavior to that of the
lambda
version be more consistent with what "stable" means? Why is it not set this way? - Is this behavior simply a side-effect of how tuples and/or lists are inherently compared (i.e. the false assumption)?
Thanks.
Please note that this is not about whether the default behavior is or isn't useful, common, or something else. It's about whether the default behavior is consistent with the definition of what it means to be stable (which, IMHO, does not appear to be the case) and the guarantee of stability mentioned in the docs.