1

EDIT: This question is about why the behavior is what it is, not how to get around it, which is what the alleged duplicate is about.


I've used the following notation to create lists of a certain size in different cases. For example:

>>> [None] * 5
[None, None, None, None, None]
>>>

This appears to work as expected and is shorter than:

>>> [None for _ in range(5)]
[None, None, None, None, None]
>>>

I then tried to create an list of lists using the same approach:

>>> [[]] * 5
[[], [], [], [], []]
>>>

Fair enough. It seems to work as expected.

However, while going through the debugger, I noticed that all the sub-list buckets had the same value, even though I had added only a single item. For example:

>>> t = [[]] * 5
>>> t
[[], [], [], [], []]
>>> t[1].append(4)
>>> t
[[4], [4], [4], [4], [4]]
>>> t[0] is t[1]
True
>>>

I was not expecting all top-level array elements to be references to a single sub-list; I expected 5 independent sub-lists.

For that, I had to write code like so:

>>> t = [[] for _ in range(5)]
>>> t
[[], [], [], [], []]
>>> t[2].append(4)
>>> t
[[], [], [4], [], []]
>>> t[0] is t[1]
False
>>>

I'm clearly missing something, probably a historical fact or simply a different way in which the consistency here is viewed.

Can someone explain why two different code snippets that one would reasonably expect to be equivalent to each other actually end up implicitly producing different and non-obvious (IMO) results, especially given Python's zen of always being explicit and obvious?

Please note that I'm already aware of this question, which is different to what I'm asking.

I'm simply looking for a detailed explanation/justification. If there're historical, technical, and/or theoretical reasons for this behavior, then please be sure to include a reference or two.

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
code_dredd
  • 5,915
  • 1
  • 25
  • 53
  • Keep in mind you are using Python *lists*, not arrays. – juanpa.arrivillaga Jul 04 '17 at 23:57
  • @juanpa.arrivillaga up until now I thought python just renamed arrays to lists just for the sake of it – Qwertie Jul 05 '17 at 00:00
  • 1
    There is definitely an answer for this....can't seem to find it. But this is well explained in quite a few places. – idjaw Jul 05 '17 at 00:00
  • Also, from [this](https://stackoverflow.com/questions/18946728/changing-an-element-in-one-list-changes-multiple-lists) there are a few more questions inside that will provide more explanations. – idjaw Jul 05 '17 at 00:03
  • @juanpa.arrivillaga Ah. So, this *isn't* asking about what I'm flagging as? So, should I remove? It sure seems quite like it though.. – idjaw Jul 05 '17 at 00:04
  • ```t = [[]] * 5``` creates five of the same object, the list comprehension creates five separate objects. – wwii Jul 05 '17 at 00:04
  • @wwii yes, but I think the OP already know that, they are asking *why*. – juanpa.arrivillaga Jul 05 '17 at 00:06
  • @juanpa.arrivillaga Didn't you close this question? – cs95 Jul 05 '17 at 00:08
  • 2
    @idjaw well, honestly, I'm torn on this one. I had originally made it a dupe target of that well-known one, but reopened it upon further consideration. The dupe target asks " Can someone please explain what's going on, and how to get around it?" But the OP already *knows* what's going on, and how to get around it. Their question is *why are these two approaches not semantically equivalent*. – juanpa.arrivillaga Jul 05 '17 at 00:09
  • @cᴏʟᴅsᴘᴇᴇᴅ I reopened it. See my other responses here. – juanpa.arrivillaga Jul 05 '17 at 00:09
  • @juanpa.arrivillaga Yep. I see your point. I've nominated for re-opening. Thanks for clarifying. – idjaw Jul 05 '17 at 00:10
  • The people who marked this question as a duplicate clearly did *not* bother to understand _what_ it is that I was asking about. As juanpa noted, I already know the difference and how to get what I want. In addition, I understand what the code is doing, so don't need the code explained to me. The downvotes don't make much sense to me, either. – code_dredd Jul 05 '17 at 00:15
  • @idjaw I'm not sure that what I'm actually asking is what you think I'm asking, given the alleged duplicate that isn't. – code_dredd Jul 05 '17 at 00:18
  • @ray If you read through the comments, you would have seen that I *understood* and corrected my mistake by *voting* to re-open. I don't have the ability to re-open the question on my own. Mistakes happen, and getting outwardly frustrated at people is not helpful. Furthermore, it is impossible to tell who down voted you. It is anonymous. FWIW, it wasn't me. It is something that, yes, does get frustrating, and sometimes it happens that the downvoter does leave a comment to explain themselves. – idjaw Jul 05 '17 at 00:22
  • @idjaw I'm aware of that; my comment is general; most others just downvoted and left. I'm also aware of the other details you mentioned; I'm not interested in _who_ downvoted, but _why_. I know that these days a lot of people don't bother stating _why_ they're downvoting something, but I'm clearly not the only one wondering why the question got downvoted in the 1st place (e.g. @JoeMoe1984). It doesn't look like a "bad" or "useless" question, in my (admittedly biased) opinion at least. There's no need to react defensively; I'm not attacking anyone out of "frustration", etc. – code_dredd Jul 05 '17 at 00:28
  • @ray From my experience, the why sometimes ends up driving you more mad than it should. It is frustrating, and I've been there too. – idjaw Jul 05 '17 at 00:31
  • As a note, the function `list_inplace_repeat` might be a place to look at in the [`list` source](https://github.com/python/cpython/blob/master/Objects/listobject.c) – Jared Goguen Jul 05 '17 at 01:07
  • @JaredGoguen: `list_repeat` is more relevant; `list_inplace_repeat` is for `*=`. – user2357112 Jul 05 '17 at 01:08
  • 1
    "especially given Python's zen of always being explicit and obvious" - well, the problem there is that as far as Python is concerned, you *were* explicit about what Python should do. You just didn't understand what you were telling it to do. While they could have made things even more explicit, splitting everything newbies might expect to make a copy into separate copy and no-copy versions, it'd get really cumbersome to explicitly write `x nc= [1, 2, 3]` or `y nc= [[None] nc* 4] c* 5` all the time, and it'd be a pretty major design change to make copying so fundamental. – user2357112 Jul 05 '17 at 01:22
  • @user2357112 *"You just didn't understand[...]"* Given I stated that I do know the semantic difference between the snippets, I think I _did_ understand. That's different from _why_ the semantic difference exists at all. The question of whether `[[]] * 5` should mean "one N-array with a single shared sub-array ref" or "one N-array with N separate sub-arrays", and which interpretation is the "obvious" one, is debatable. Clearly, the "obvious" one for me was the latter, but Python3 uses the former. However, *syntax* does not make this obvious in an *unambiguous* sense, IMHO, hence my question. – code_dredd Jul 05 '17 at 01:36
  • You know the difference now, but you didn't when you wrote the code! – user2357112 Jul 05 '17 at 01:42
  • @user2357112 I knew the difference _before_ asking the question, and I stated had that. As I noted, I asked _why_ it was done this way. For example, **if** someone had said something along the lines of *"It had to be implemented that way because the alternative would've allowed multiple abstract syntax trees to be built from the same code, hence introducing ambiguity"* I would've thought that to be a good reason, and even a better answer if it had documented those details, etc. (Note it's a hypothetical only.) – code_dredd Jul 05 '17 at 01:47
  • 2
    `[[]] * 4` just sees that the object it's multiplying by 4 is a 1-element list. It doesn't have any idea how to build copies of generic list elements, it can't reevaluate the `[]` expression because it doesn't see that expression, and special-casing specific types of list elements to make copies of them would be inconsistent. Without inconsistency or a substantial language redesign, the only option `*` has is to copy references instead of objects. – user2357112 Jul 05 '17 at 01:50
  • Because your expectations are wrong. – user2699 Jul 05 '17 at 16:39
  • This is how python builds lists of objects. Why should a list of lists be any different than a list of dicts? `t = [{}]*5` gives the exact same behavior. If you could modify `None` you could see that it is the same object in your example array, just repeated. – user2699 Jul 05 '17 at 16:44
  • @user2699 I refer you to the reply I had already posted to user2357112 several posts above this one (seven at the time of this writing, inclusive) regarding the existence of the semantic difference. Note that whether you want to use lists or dictionaries for the point I was trying to get to is not as relevant as it might appear at first. – code_dredd Jul 06 '17 at 02:31
  • 1
    @ray I also notice you've been using the term "array". But these are *not arrays*. They are *lists*. Yes, deep down inside there is a C array of Py_Object pointers, but that is an implementation detail. They are much more hefty data structures than mere arrays. They are heterogenous, resizable lists with amortized constant time `.append` and constant time indexing... – juanpa.arrivillaga Jul 06 '17 at 18:49
  • @juanpa.arrivillaga I'm aware of the distinction between an array structure and a list ADT, but it's not as meaningful in the context of my question, IMHO. – code_dredd Jul 06 '17 at 19:33
  • 1
    Why on earth would anyone downvote this question... – Daniel F Dec 27 '20 at 11:45

2 Answers2

5

When you do the following:

[[]]*n

You are first creating a list, then using the * operator with an int n. This takes whatever objects are in your list, and creates n- many repetitions of it.

But since in Python, explicit is better than implicit, you don't implicitly make a copy of those objects. Indeed, this is consistent with the semantics of Python.

Try to name a single case where Python implicitly makes a copy.

Furthermore, it is consistent with the addition on the list:

l = [1, [], 'a']

l2 = l + l + l

l[1].append('foo')

print(l2)

And the output:

[1, ['foo'], 'a', 1, ['foo'], 'a', 1, ['foo'], 'a']

Now, as noted in the comments, coming from C++ it makes sense that the above would be surprising, but if one is used to Python, the above is what one would expect.

On the other hand:

[[] for _ in range(5)]

Is a list comprehension. It is equivalent to:

lst = []
for _ in range(5):
    lst.append([])

Here, clearly, every time you are in the loop you create a new list. That is how literal syntax works.

As an aside, I almost never use the * operator on lists, except for one particular idiom I am fond of:

>>> x = list(range(1, 22))
>>> it_by_three = [iter(x)]*3
>>> for a,b,c in zip(*it_by_three):
...    print(a, b, c)
...
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20 21
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • 3
    Since you are trying to explain _why_, I think it should be a good idea to go into details on how `list.__mul__` has been implemented, and the reasoning behind why it was not implemented for other mutable structures like the set and list. – cs95 Jul 05 '17 at 00:13
  • 1
    This answer merely explains what the code means. I already know this. This does not really address my question, unfortunately. – code_dredd Jul 05 '17 at 00:16
  • @ray the Crux of the answer is in the italicized text, but to reiterate: in Python, explicit is better than implicit, you don't implicitly make a copy of those objects. – juanpa.arrivillaga Jul 05 '17 at 15:35
  • @cᴏʟᴅsᴘᴇᴇᴅ I actually don't think the details of how it is implemented really matter. The question is one of design choices. My answer boils down to "because Python never implicitly makes a copy, and this behavior is *consistent* with the semantics of the language". – juanpa.arrivillaga Jul 05 '17 at 16:26
  • @cᴏʟᴅsᴘᴇᴇᴅ furthermore, it is implemented for *sequence types* wherin it makes sense, mutable or immutable. So, it works for the immutable `str`, `bytes`, and `tuple` as well as the mutable `bytearray`. It doesn't work for non-sequence types. And it doesn't work for at least one sequence type where you couldn't really do it, i.e. `range` – juanpa.arrivillaga Jul 05 '17 at 16:35
1

For cpython, the relevant part of the source code is in the function list_repeat in listobject.c. An enlightening snippet is repeated below, with my added comments:

np = (PyListObject *) PyList_New(size);  // make a new PyListObject

/* some code omitted */

items = np->ob_item;          // grabs the list of pointers of the *new* object
if (Py_SIZE(a) == 1) {        // this is the case for a 1-element list being multiplied
    elem = a->ob_item[0];     // grabs the pointer of the element of the *original* object
    for (i = 0; i < n; i++) {
        items[i] = elem;      // assigns the original pointer to the new list
        Py_INCREF(elem);
    }
    return (PyObject *) np;
}

Since a PyListObject is mainly a Vector containing a list of pointers to the list elements, it is simple to assign these points as elements to the new PyListObject.

On the contrary, imagine the code if the object located at each pointer needed to be copied. It would be more complex and there would be a noticable performance hit. However, I'm not going to speculate in regards to the motivation of this design decision.

Jared Goguen
  • 8,772
  • 2
  • 18
  • 36
  • That's a good find. I had searched for PEPs or something that would explain why the design decision was made. The one I came across was [PEP-209](https://www.python.org/dev/peps/pep-0209/), but it's not relevant to my question. Have you searched for and/or seen discussions about this elsewhere (e.g. mailing list, etc)? You said: *"imagine the code if the object located at each pointer needed to be copied"*. It seems speculative; after all, it seems easy to "imagine" something along the following lines: `items[i] = Py_Copy(elem);` – code_dredd Jul 05 '17 at 04:17
  • @ray: Judging by your tag history, you're used to C++, where copying is one of the most fundamental operations in the entire language. In Python's language design, copying is extremely far from a fundamental operation. There is no C-level copying interface. There is no function to test whether an object is copyable. Objects almost never attempt to make copies of other objects, only of themselves. Assignment doesn't copy, parameter passing and return values don't make copies, and there's no concept of copy initialization. There's a `copy` module, but [cont] – user2357112 Jul 05 '17 at 05:40
  • [cont] it's not used much in the stdlib, mostly in test code, and almost always with objects of a known type already known to be copyable. In C++, it would be reasonable and expected for a container to assume its elements are copyable and to make implicit copies. In Python, such an assumption is unreasonable. – user2357112 Jul 05 '17 at 05:44
  • @user2357112 Judging from your comment (i.e. *"no C-level copying interface ... no function to test ... an object is copyable"*), it's either out of convenience (i.e. simpler to implement) or an intentional design decision (i.e. more likely given the absence of C functions to check for those things). OTOH, the [copy](https://docs.python.org/3/library/copy.html) module exists, which allows `deepcopy`ing. Your comment seems to imply that `copy.deepcopy` *must* rely on a pure Python implementation, since there's "no way" to test an object for whether it can be copied natively; can you clarify? – code_dredd Jul 05 '17 at 05:57
  • 1
    @ray: `copy.deepcopy` does use a pure-Python implementation, which you can see [here](https://github.com/python/cpython/blob/master/Lib/copy.py), but it could have been written as a C extension module that does the same thing. While we're talking, I should point out that some of the fallbacks it tries aren't very safe, so it's [prone to failure](http://ideone.com/tTvBa8) for even fairly simple self-referential structures. – user2357112 Jul 05 '17 at 06:52
  • @user2357112 Interesting that it fails with a self-reference, given that the module's documentation goes out of its way to state that it "avoids" this self-referential problem by "keeping a table of objects already copied during the current copying pass". – code_dredd Jul 05 '17 at 08:43