Why does this key class for sorting heterogeneous sequences behave oddly?

Question

Python 3.x's sorted() function cannot be relied on to sort heterogeneous sequences, because most pairs of distinct types are unorderable (numeric types like int, float, decimal.Decimal etc. being an exception):

Python 3.4.2 (default, Oct  8 2014, 08:07:42) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> sorted(["one", 2.3, "four", -5])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: float() < str()

In contrast, comparisons between objects that have no natural order are arbitrary but consistent in Python 2.x, so sorted() works:

Python 2.7.8 (default, Aug  8 2014, 14:55:30) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> sorted(["one", 2.3, "four", -5])
[-5, 2.3, 'four', 'one']

In order to replicate Python 2.x's behaviour in Python 3.x, I wrote a class to use as the key parameter to sorted(), which relies on the fact that sorted() is guaranteed to use only less-than comparisons:

class motley:

    def __init__(self, value):
        self.value = value

    def __lt__(self, other):
        try:
            return self.value < other.value
        except TypeError:
            return repr(type(self.value)) < repr(type(other.value))

Example usage:

>>> sorted(["one", 2.3, "four", -5], key=motley)
[-5, 2.3, 'four', 'one']

So far, so good.

However, I've noticed a surprising behaviour when sorted(s, key=motley) is called with certain sequences containing complex numbers:

>>> sorted([0.0, 1, (1+0j), False, (2+3j)], key=motley)
[(1+0j), 0.0, False, (2+3j), 1]

I would have expected 0.0, False and 1 to be in one group (because they are mutually orderable), and (1+0j) and (2+3j) in another (because they are of the same type). The fact that the complex numbers in this result are not only separated from each other, but one of them is sitting in the middle of a group of objects that are comparable with each other but not with it, is somewhat perplexing.

What's going on here?

I'm not sure, but what you describe might also be an effect of the fact that Python's sorting algorithm is supposed to be stable, in that it doesn't reorder a sequence that is already sorted. See [Timsort](http://en.wikipedia.org/wiki/Timsort). — Lukas Graf, Oct 25 '14 at 22:22

BrenBarn · Accepted Answer · 2014-10-25T22:41:34.550

You do not know what order the comparisons are done in, or even which items are compared, which means you can't really know what effect your __lt__ will have. Your defined __lt__ sometimes depends on the actual values, and sometimes on the string representations of the types, but both versions may be used for the same object in the course of the sort. This means that your ordering is not determined solely by the objects in the list, but also may depend on their initial order. This in turn means that just because objects are mutually comparable does not mean they will be sorted together; they may be "blocked" by an incomparable object between them.

You can get an inkling of what is going on by putting some debugging prints in to see what it's comparing:

class motley:

    def __init__(self, value):
        self.value = value

    def __lt__(self, other):
        fallback = False
        try:
            result = self.value < other.value
        except TypeError:
            fallback = True
            result = repr(type(self.value)) < repr(type(other.value))
        symbol = "<" if result else ">"
        print(self.value, symbol, other.value, end="")
        if fallback:
            print(" -- because", repr(type(self.value)), symbol, repr(type(other.value)))
        else:
            print()
        return result

Then:

>>> sorted([0.0, 1, (1+0j), False, (2+3j)], key=motley)
1 > 0.0
(1+0j) < 1 -- because <class 'complex'> < <class 'int'>
(1+0j) < 1 -- because <class 'complex'> < <class 'int'>
(1+0j) < 0.0 -- because <class 'complex'> < <class 'float'>
False > 0.0
False < 1
(2+3j) > False -- because <class 'complex'> > <class 'bool'>
(2+3j) < 1 -- because <class 'complex'> < <class 'int'>
[(1+0j), 0.0, False, (2+3j), 1]

You can see, for instance, that the type-based ordering is used for comparing the complex number to 1, but not for comparing 1 and 0. Likewise 0.0 < False for "normal" reasons, but 2+3j > False for type-name-based reasons.

The result is that it sorts 1+0j to the beginning, and then leaves 2+3j where it is above False. It never even attempts to compare the two complex numbers to each other, and the only item they are both compared to is 1.

More generally, your approach can lead to an intransitive ordering with appropriate choices for the names of the types involved. For instance, if you define classes A, B, and C, such that A and C can be compared, but they raise exceptions when comparing to B, then by creating objects a, b and c (from the respective classes) such that c < a, you can create a cycle a < b < c < a. a < b < c will be true because the classes will be compared based on their names, but c < a since these types can be directly compared. With an intransitive ordering, there is no hope of a "correct" sorted order.

You can even do this with builtin types, although it requires getting a little creative to think of objects whose type names are in the right alphabetical sequence:

>>> motley(1.0) < motley(lambda: 1) < motley(0) < motley(1.0)
True

(Because 'float' < 'function' :-)

Heh ... yeah, the `a < b < c < a` cycle does kind of kill the whole approach, doesn't it? Thanks for the thorough answer :-) — Zero Piraeus, Oct 25 '14 at 22:46
Thought you might be interested to know I've posted a [followup question](http://stackoverflow.com/questions/26575183/how-can-i-get-2-x-like-sorting-behaviour-in-python-3-x) ... — Zero Piraeus, Oct 26 '14 at 16:48

Why does this key class for sorting heterogeneous sequences behave oddly?

1 Answers1

Linked