-2

sorted() documentation (emphasis mine):

The built-in sorted() function is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal [...]

In the code below, keys 8 and 16 have the same value, that is they would compare equal by the key=lambda x: d[x]:

d = {8: 0, 16: 0, 'a': 1}    

for i in range(200):    
    print(sorted(d, key=lambda x: d[x]))

# Prints always the same 
# [8, 16, 'a']
# [8, 16, 'a']
# [8, 16, 'a']
# ......

Now lets try a small modification to the dictionary:

Example 1:

for i in range(10):

    if i == 5:

        del d[8]
        del d[16]

        d.update({16: 0})
        d.update({8: 0})

    print(sorted(d, key=lambda x: d[x]))

# Prints: 
# [8, 16, 'a']
# [8, 16, 'a']
# [8, 16, 'a']
# [8, 16, 'a']
# [8, 16, 'a'] 
# [16, 8, 'a']    <----- it changed the order
# [16, 8, 'a']
# [16, 8, 'a']
# [16, 8, 'a']
# [16, 8, 'a']

id(d) remains the same. Also the contents remain the same. What changes is the order of key insertion.

Note:
Those keys were chosen so that they cause hash collisions:

whether 8 occupies a location or 16 is determined by which one arrived at the party first


Example 2:

d = {8: 0, 16: 0, 'a': 1}
c = {16: 0, 8: 0, 'a': 1}

print(sorted(d, key=lambda x: d[x]))
print(sorted(c, key=lambda x: d[x]))

I know that d and c are different objects here:

id(d)  # 140067442777736
id(c)  # 140067442811016

But I would expect that sorted() treats objects that are d == c the exact same way.


Example 3: A different example would be the following:

d = {'a': 0, 'b': 0, 'c': 0, 'd': 1}

print(sorted(d, key=lambda x: d[x]))

Running this on multiple different runs (not with a for loop) would give a different order each time.

Note: That's assuming you use Python 3.3 or higher and your PYTHONHASHSEED=random


Question:
The order is not stable:

  • for the same object that gets its order modified (example 1).
  • for 2 objects that compare equal (example 2).
  • for different runs of the exact same code (example 3).

Is the order instability mentioned above a bug or am I missing something? Do I misunderstand the documentation by expecting all 3 examples to have a stable order?

user
  • 5,370
  • 8
  • 47
  • 75
  • 1
    What makes you think sorted is at fault here? The order of the keys changed. `sorted()` maintains the relative order you give to it. Giving it `[16, 8]` is different from giving it `[8, 16]`. – Martijn Pieters Sep 13 '15 at 09:37
  • @MartijnPieters Its documentation made me think that on different runs, it would give the same result. Which it doesn't. Yes, `[16,8] != [8,16]`. **However** `{16,8} == {8,16}`. – user Sep 13 '15 at 09:43
  • 1
    So? Just because the dictionaries are equal doesn't mean that sorted can see that or cares. It is given a sequence of keys, in a given order. You changed that order. – Martijn Pieters Sep 13 '15 at 09:46
  • but you are not sorting based on `16` or `8` at all . You are sorting based on its corresponding values from dictionary. If the values are same sorted would use the element that came first. Try out something like - `sorted([3,4,5,1,2],key=lambda x:1)` , would you really expect sorted to change the order? – Anand S Kumar Sep 13 '15 at 09:46
  • If you called `list()` on the dictionary versions you also get different lists. `list({8, 16}) != list({16, 8})`. You reintroduced *ordering*. – Martijn Pieters Sep 13 '15 at 09:49
  • @AnandSKumar No, I wouldn't expect it to change, but I dont see how this is relevant. My point is, the documentation states "stable order" which I needed to know whether it includes (or excludes) the examples in my question. – user Sep 13 '15 at 10:22
  • 1
    its not `sorted()` that is changing the order, `sorted()` is getting the changed order from dictionary itself. – Anand S Kumar Sep 13 '15 at 10:25
  • @AnandSKumar I know. :) My problem is that I would expect the order of the dict or set given to `sorted()` to be irrelevant of its output. – user Sep 13 '15 at 10:32
  • Why would you expect that? When you do not expect the order to be changed for - `sorted([3,4,5,1,2],key=lambda x:1)` ? Or in other words why would you expect that when these produce different results - `sorted([3,4,5,1,2],key=lambda x:1)` and `sorted([3,5,4,2,1],key=lambda x:1)` – Anand S Kumar Sep 13 '15 at 10:37
  • @AnandSKumar Well... those are lists. Their order will be the same no matter how many times you run that code. A dict/set order can change based on PYTHONHASHSEED or hash collisions. – user Sep 13 '15 at 10:41
  • @user536231 The order of the input to `sorted` is _critical_ to its output. That's what "stable sort" means --- a sorting method that preserves the original ordering of equal-ish elements of its input. If the input is altered so that their relative order changes, then the output _must_ also change to match their new relative order. Again, that's what "stable sort" _means_. – Kevin J. Chase Sep 13 '15 at 10:43
  • @user536231 In your Example 2, you said "I would expect that sorted() treats objects that are d == c the exact same way." But you then fed `sorted` two _unequal_ objects! In Python 2, `c.keys()` returns `[16, 8, 'a']`, while `d.keys()` is `[8, 16, 'a']`. It doesn't remotely matter where those lists came from, what matters is _they're not equal_ --- specifically, the order of the "equalish" elements has changed. Therefore, `sorted` _must_ return two unequal lists. Again, that's what it means to be a stable sort, and what the sentence you highlighted guarantees. – Kevin J. Chase Sep 13 '15 at 11:16
  • @user536231: the documentation never claims that two dictionaries will always sort the same way. Are you confusing the *compare equal* part of the documentation here? That refers to the values *in* the sequence, not the dictionaries that were the source of the sequence. – Martijn Pieters Sep 13 '15 at 12:23

1 Answers1

6

It is not a bug, you missed something. You manipulated the dictionary changing the order of iteration, and it is that order that is kept stable by sorted(). Or put differently, sorted() keeps the input order stable, and you changed that input order by altering the dictionary.

Note that sorted() doesn't 'see' a dictionary here. It is passed a sequence of keys, just as if you used list() on the dictionary first. In this case, both 8 and 16 hash to the same hash-table slot. Which one is listed first depends on the order of insertions:

>>> d1 = {}
>>> d1[8] = None
>>> d1[16] = None
>>> d1
{8: None, 16: None}
>>> list(d1)
[8, 16]
>>> d2 = {}
>>> d2[16] = None
>>> d2[8] = None
>>> d2
{16: None, 8: None}
>>> list(d2)
[16, 8]

Calling list() on the two dictionaries shows that the keys are listed in a different order. sorted() just maintained that order as it iterates over the dictionary, since they both are otherwise sorted in the same location, by value. This is exactly what the documentation tells you would happen.

Note that dictionary equality has no role to play here, at all. That's not what the compare equal part of the documentation is referring to. That only refers to the elements in the sequence; if the elements are equal (or in this case, if the key(element) values are equal), then those elements retain their relative order.

So to focus on possible things you missed here:

  • The signature of the method is sorted(iterable[, key][, reverse]); the key word there is iterable. The first line of the documentation:

    Return a new sorted list from the items in iterable.

    Emphasis mine; it is the items from the iterable that are sorted. The Python glossary defines iterable as:

    An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() or __getitem__() method. [...] When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values.

    Anything that takes an iterable basically will call iter() on it to loop over the sequence produced.

  • Dictionaries happen to be iterable, and iterating gives you the keys. See the mapping types documentation:

    iter(d)
    Return an iterator over the keys of the dictionary. This is a shortcut for iter(d.keys()).

    The dict.keys() documentation then points to the dictionary views section, which states:

    iter(dictview)
    Return an iterator over the keys, values or items (represented as tuples of (key, value)) in the dictionary.

    Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions. If keys, values and items views are iterated over with no intervening modifications to the dictionary, the order of items will directly correspond. This allows the creation of (value, key) pairs using zip(): pairs = zip(d.values(), d.keys()). Another way to create the same list is pairs = [(v, k) for (k, v) in d.items()].

    Again, emphasis mine. So sorted() iterates, taking the items to sort. Dictionaries, when iterated, produce keys whose order is not stable.

  • Finally, the section you quoted, never contradicts this:

    The built-in sorted() function is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal.

    So the elements in the sequence iterated do not change order. Nowhere does this state that dictionaries cannot produce a different order when iterated however.

    In other words, it doesn't matter if iterable_a == iterable_b here, it is not about iterable equality, only element equality matters to the sort order stability. If the iteration order differs, that order is kept stable.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343