50

I recently came across a weird discrepancy when trying to sort self-referential lists using .sort() and sorted(). I was hoping someone could shed some light on this. The code in question is the following:

lst = [1, 2, 3]

lst[0] = lst
lst[1] = lst
lst[2] = lst

print(lst)

print(sorted(lst))

lst.sort()

print(lst)

The above code produces the following output:

[[...], [...], [...]]
[[[...], [...], [...]], [[...], [...], [...]], [[...], [...], [...]]]
[[...], [...], [...]]

It's the output from print(sorted(lst)) that baffles me. Wonder if it is some form of recursion that's causing it?

ggorlen
  • 44,755
  • 7
  • 76
  • 106
abhishekbasu
  • 545
  • 4
  • 9
  • 13
    `sort` and `sorted` don't really have anything to do with that. It's enough to compare the output of `print(l)` and `print(list(l))` – DeepSpace Jul 05 '21 at 20:23
  • 11
    In other words, it is due to how new lists are constructed from an existing, self-reffering list, not to sorting – DeepSpace Jul 05 '21 at 20:23

2 Answers2

96

I'm going to call your list x, because frankly l looks too much like the number one and is throwing me off. So you've got a list x which looks like this

[x, x, x]

Now, we do

print(x)

Python is smart enough to say "hey, look, this list contains itself recursively, let's not print it again inside of itself." All of the places x appears in your list, we get [...]

[[...], [...], [...]]

Now consider

x.sort()
print(x)

We sort the list, which doesn't do much since every element is the same. However, crucially, it all happens in-place. The list started out looking like [x, x, x] and will end looking like [x, x, x], where x is our list. So the print looks the same.

[[...], [...], [...]]

Finally, your fun example.

sorted(x)

sorted, unlike list.sort, does not modify the list and instead produces a new list. Let's call this new list y. In your x.sort example, at the end, we have the same list x that looks like x = [x, x, x]. When we print the list, we immediately see the recursion and stop printing.

However, sorted(x) produces a new list. The list will still look like [x, x, x], but it's not the list x. It's a new list y = [x, x, x].

Now, we do

print(sorted(x))

Python sees a list of three elements: [x, x, x]. We look at each of those elements. We're printing y, so the fact that this list contains x is not a recursion problem; it's a perfectly ordinary list that contains other lists. So we print x inside of y. Now, one more layer down, we look inside x and see that it contains, lo and behold, x again. That is a recursion problem, but it happened one step later, because we made a new list that, although it looks identical to the original, is distinct.

[[[...], [...], [...]], [[...], [...], [...]], [[...], [...], [...]]]
Silvio Mayolo
  • 62,821
  • 6
  • 74
  • 116
  • 1
    I prefer `L` instead of `l` or `x` – Grijesh Chauhan Jul 06 '21 at 08:25
  • 27
    @GrijeshChauhan: names starting with an uppercase letter are typically used for constants. – Eric Duminil Jul 06 '21 at 11:36
  • 7
    Constants or classes; in any case not variables, for which the convention is `snake_case`. I personally see many data scientists and university students especially go against this and it makes their code so much harder to read. – theberzi Jul 06 '21 at 12:44
  • @EricDuminil I always emphasis on readability over practice and rules. I do not use `L` in project code but for this use-case `L` looks better than `x` *to me* -- still it is inexact science and everyone has their own preference. Yes, capitals mostly apt for constant naming. – Grijesh Chauhan Jul 07 '21 at 09:14
  • 3
    @theberzi: It's possible to write FORTRAN in any language. :) – Eric Duminil Jul 07 '21 at 09:51
4

Maybe you thought after lst[0] = lst that lst would be [[1,2,3],2,3]. If so, you'd be assuming = lst passes by value. But lists are passed by reference, so at that point lst is [lst,2,3].

To pass by value, use lst[0] = list(lst) etc. This time the right-hand side creates a new list, with the same value as before but a new reference, since in this context list(lst) is syntactic sugar for [v for v in lst].

As @DeepSpace noted, this fact about list also explains why print(list(lst)) is three times more verbose than print(lst); it prints [v for v in lst], which is just [lst, lst, lst].

J.G.
  • 295
  • 2
  • 13
  • We don't have these two *"call/pass by value"* *"call/pass by reference"* terminologies in Python. – S.B Jul 12 '22 at 11:37