1

Summary

Sorting in Python is guaranteed to be stable since Python 2.2, as documented here and here.

Wikipedia explains what the property of being stable means for the behavior of the algorithm:

A sorting algorithm is stable if whenever there are two records R and S with the same key, and R appears before S in the original list, then R will always appear before S in the sorted list.

However, when sorting objects, such as tuples, sorting appears to be unstable.

For example,

>>> a = [(1, 3), (3, 2), (2, 4), (1, 2)]
>>> sorted(a)
[(1, 2), (1, 3), (2, 4), (3, 2)]

However, to be considered stable, I thought the new sequence should've been

[(1, 3), (1, 2), (2, 4), (3, 2)]

because, in the original sequence, the tuple (1, 3) appears before tuple (1, 2). The sorted function is relying on the 2-ary "keys" when the 1-ary "keys" are equal. (To clarify, the 1-ary key of some tuple t would be t[0] and the 2-ary t[1].)

To produce the expected result, we have to do the following:

>>> sorted(a, key=lambda t: t[0])
[(1, 3), (1, 2), (2, 4), (3, 2)]

I'm guessing there's a false assumption on my part, either about sorted or maybe on how tuple and/or list types are treated during comparison.

Questions

  1. Why is the sorted function said to be "stable" even though it alters the original sequence in this manner?
  2. Wouldn't setting the default behavior to that of the lambda version be more consistent with what "stable" means? Why is it not set this way?
  3. Is this behavior simply a side-effect of how tuples and/or lists are inherently compared (i.e. the false assumption)?

Thanks.


Please note that this is not about whether the default behavior is or isn't useful, common, or something else. It's about whether the default behavior is consistent with the definition of what it means to be stable (which, IMHO, does not appear to be the case) and the guarantee of stability mentioned in the docs.

code_dredd
  • 5,915
  • 1
  • 25
  • 53
  • 4
    Again with the downvoters that don't even bother to provide an explanation. In what way does this question "not show any research effort, is unclear, or not useful"?... Those votes came before any reasonable amount of time needed to actually *read* the question had gone by... smh – code_dredd Sep 01 '17 at 10:44
  • I guess the downvoter consider it a lack of research that you only *assume* how tuples should be sorted. – GhostCat Sep 01 '17 at 10:46
  • @GhostCat While I can see that part of your feedback, I think it's unreasonable for anyone to assume that missing *one* detail negates the rest of the research that *did* go into it. In any case, I digress. – code_dredd Sep 01 '17 at 10:57
  • 1
    I didn't downvote, so I can only speculate. And honestly: I *rarely* see that questioners agree to downvotes. So that piece of information isn't exactly newsworthy. And just for the record: you focused your research on the *stable* part. You then assumed that **your** idea how tuples should be sorted is correct. Maybe some folks simply understand how well designed and "perfect" the python sort implementation is. So instead of asking "woha, why is python sort not stable" ... a title like "where is the flaw in my logic" would have resulted in less downvotes. But whatever. Nice question. – GhostCat Sep 01 '17 at 11:20
  • @GhostCat `"I rarely see that questioners agree to downvotes"` may be true generally, but not every dv is necessarily well justified either. But, that's moot in this case, b/c no one ever bothered to give any, so we're just speculating; `"you focused your research on the stable part"` is partially right and partially off. I went through the docs; it's just that there's a *lot* to read and I missed it when I looked. The problem I see when I document everything I tried *prior* to posting is that the post becomes too verbose and then people don't bother to read it in the first place. – code_dredd Sep 02 '17 at 07:12
  • 1
    `["the docs" is] a` *lot* `to read` moreover, the bit about how *sequences of the same type compare* is *not* linked or repeated near the [claim to stability](https://docs.python.org/3/library/functions.html#sorted). – greybeard Sep 02 '17 at 07:40

3 Answers3

5

Think about it - (1, 2) comes before (1, 3), does it not? Sorting a list by default does not automatically mean "just sort it based off the first element". Otherwise you could say that apple comes before aardvark in the alphabet. In other words, this has nothing to do with stability.

The docs also have a nice explanation about how data structures such as lists and tuples are sorted lexicographically:

In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length.

TerryA
  • 58,805
  • 11
  • 114
  • 143
  • 2
    I must've missed that in the docs; it's a long document. I understand the string comparison example (i.e. char lists), but hadn't extrapolated the behavior to tuples when thinking about the other non-string types (i.e. the iterable itself being compared lexicographically, not just the characters in a string). – code_dredd Sep 01 '17 at 11:06
4

Stable sort keeps the order of those elements which are considered equal from the sorting point of view. Because tuples are compared element by element lexicographically, (1, 2) precedes (1, 3), so it should go first:

>>> (1, 2) < (1, 3)
True
bereal
  • 32,519
  • 6
  • 58
  • 104
1

A tuple's key is made out of all of its items.

>>> (1,2) < (1,3)
True
tzot
  • 92,761
  • 29
  • 141
  • 204
Bharel
  • 23,672
  • 5
  • 40
  • 80