Why was p[:] designed to work differently in these two situations?

Question

p = [1,2,3]
print(p) # [1, 2, 3]

q=p[:]  # supposed to do a shallow copy
q[0]=11
print(q) #[11, 2, 3] 
print(p) #[1, 2, 3] 
# above confirms that q is not p, and is a distinct copy 

del p[:] # why is this not creating a copy and deleting that copy ?
print(p) # []

Above confirms p[:] doesnt work the same way in these 2 situations. Isn't it ?

Considering that in the following code, I expect to be working directly with p and not a copy of p,

p[0] = 111
p[1:3] = [222, 333]
print(p) # [111, 222, 333]

I feel

del p[:]

is consistent with p[:], all of them referencing the original list but

q=p[:]

is confusing (to novices like me) as p[:] in this case results in a new list !

My novice expectation would be that

q=p[:]

should be the same as

q=p

Why did the creators allow this special behavior to result in a copy instead ?

Because historical reasons. I agree, slices should return reference slices, not new lists. This is how it works in `numpy` and it is what the user wants for most use cases. In fact, python added the `list.copy` method due to this weirdness confusing people (since before lists were copied with `[:]` wheras dicts and sets were copied with `.copy()`). Lots of old python returned inefficient copies on methods (see `dict.items` and `zip`) and this was fixed in python 3 but list slices remained as copies. I guess it was considered too much hassle to have to replace every `[a:b]` with `[a:b].copy()` — FHTMitchell, Jun 26 '19 at 23:35
@FHTMitchell: `list.copy` wasn't added due to confusion, it was added to make `list` more compatible with other collection types (`set`/`dict`, which had to have `copy` because slicing doesn't work on them). Returning reference slices makes it *much* harder to reason about code, and *much* easier to accidentally do stuff like modify caller arguments by accident. `numpy` did it their way for performance purposes; Python in general is more concerned with making it easy to write correct code, with performance being a secondary concern. — ShadowRanger, Jun 27 '19 at 14:49
@shadowranger sooooo it was done to reduce confusion. Let me be clear, I think the reference slices should be immutable in much the same way dict views work. Literally never had a problem with those. You want a new list? Simply do `list(X[a:b])`. — FHTMitchell, Jun 27 '19 at 14:50
@FHTMitchell: No it was not done to reduce confusion, it was done to increase flexibility. The idea was they wanted to make it possible for a function to duck type copying the common collections by just calling `.copy()` on whatever it received. If you required a sequence, the correct solution is still using `seq[:]`, because that handles both mutable and immutable sequences. — ShadowRanger, Jun 27 '19 at 14:54
@shadowranger Ok I agree, you need a method on Sequence that isn't an overload of `__getitem__` for copying. That is another reason. But having read the actual discussion on dev.python a big reason was to reduce the weirdness (and inconsistency) of the standard way of copying a list which was `[:]`. I fail to see why anyone would use `[:]` over `copy()` these days precicely because of duck typing. — FHTMitchell, Jun 27 '19 at 14:56
If you want to make a copy of a list, I recommend `q = list(p)` over `q = p[:]` for clarity. — marcelm, Jun 27 '19 at 18:49
@FHTMitchell: dict slice has some very weird behaviours that makes it really hard to use it for lists. For one, dict views will break if you mutate the dictionary. Since dict views are normally only used short term during iteration and since dicts are unordered, it's much less of an issue in that context, but a list slice can and will be passed around. Also, an immutable list slice can unexpectedly change if the underlying list changes. If someone does an `list.insert()`, how should a list view behave? Lazy slicing by default has its appeals but it will make the language harder for beginners. — Lie Ryan, Jun 27 '19 at 23:03
@FHTMitchell: there are lots of these strange cases like how should list view behave if someone did an `lst[:] = []` on the underlying list, that will cause all slices to the view to now become invalid. Numpy avoids these issues by deciding that their array dimension is immutable, but the dimension of python's list is decidedly mutable. — Lie Ryan, Jun 27 '19 at 23:14
@LieRyan I would expect that to work exactly the same way normal list slices work. `x = [1,2,3]; assert x[100:200] == []`.So if slices were views then `x = [1,2,3]; y = x[1:]; x[:] = []; assert y == []` Also numpy dimensions aren't immutable, the `size` of the array is immutable. That's why `x = np.arange(6); y = x[1:]; x.shape = (2, 3); assert np.all(y == [1, 2, 3, 4, 5])` works. Yes I would support lazy slicing, exactly how dict views work. No one kicks up a fuss at the idea of lazy iterators despite the fact that "the underlying mutable list can change". — FHTMitchell, Jun 28 '19 at 10:33
@FHTMitchell: actually, dict will complain if its size changed during iteration; additionally iterators tend to have a short lifetime because you're done with it once it's exhausted, slices can be used infinitely. Also, yes numpy array size is immutable is exactly the reason why they can lazy slice safely, this assumption makes operations on lazy views well defined and a lot easier to understand and reason about. — Lie Ryan, Jul 16 '19 at 03:19

Lie Ryan · Accepted Answer · 2019-06-27T22:35:27.490

59

del and assignments are designed consistently, they're just not designed the way you expected them to be. del never deletes objects, it deletes names/references (object deletion only ever happens indirectly, it's the refcount/garbage collector that deletes the objects); similarly the assignment operator never copies objects, it's always creating/updating names/references.

The del and assignment operator takes a reference specification (similar to the concept of an lvalue in C, though the details differs). This reference specification is either a variable name (plain identifier), a __setitem__ key (object in square bracket), or __setattr__ name (identifier after dot). This lvalue is not evaluated like an expression, as doing that will make it impossible to assign or delete anything.

Consider the symmetry between:

p[:] = [1, 2, 3]

and

del p[:]

In both cases, p[:] works identically because they are both evaluated as an lvalue. On the other hand, in the following code, p[:] is an expression that is fully evaluated into an object:

q = p[:]

edited Jun 27 '19 at 22:35

answered Jun 27 '19 at 00:20

Lie Ryan

62,238
13
100
144

Interesting. So, I wonder why numpy didnt follow the same reasoning ? – 2020 Jun 27 '19 at 00:28
3

`p[:]` is similar but different in numpy. See the [docs](https://www.numpy.org/devdocs/user/quickstart.html#view-or-shallow-copy) for views. "Slicing an array returns a view of it" and views are "a new array object that looks at the same data" – Jab Jun 27 '19 at 00:37
4

@brainOverflow numpy wants to be as efficient as possible and also as terse and easy to read as possible. Since numpy code will be optimized why make the most common code more verbose? numpy was not born as a generalized framework, it was designed for people with a certain level of expertise so the choices they made are different than those that Guido made when designing the python language – Bakuriu Jun 27 '19 at 18:47
Standard python and numpy works exactly the same in regards to assignment and deletion operation; their difference isn't in assignment and deletion, but in the slicing when evaluated in the rvalue context. A numpy array returns a view when sliced, standard python returns a copy. This small difference can make a huge impact in how operations on standard python and numpy behaves; numpy will overwrite the default data when standard python doesn't. But if you understand the python object model and that numpy slices is a view, those differences in behaviour makes sense and is actually quite simple. – Lie Ryan Jun 27 '19 at 22:49

score 24 · Answer 2 · answered Jun 26 '19 at 23:58

24

del on iterator is just a call to __delitem__ with index as argument. Just like parenthesis call [n] is a call to __getitem__ method on iterator instance with index n.

So when you call p[:] you are creating a sequence of items, and when you call del p[:] you map that del/__delitem__ to every item in that sequence.

answered Jun 26 '19 at 23:58

ipaleka

3,745
2
13
33

„`del` on iterator“ seems to be wrong because `del` on iterators isn't defined. – BlackJack Jun 27 '19 at 13:19
He didn't say "del on iterator", he said "del on sequence". And even if it's not implementation-correct (you don't map anything), it is a manner of thinking about it that'll generally lead to the correct conclusions. – Gloweye Jun 28 '19 at 07:01
@JaccovanDorp The answer _literally_ starts with „`del` on iterator is …“, so ipaleka _did_ say/wrote that. – BlackJack Jul 02 '19 at 13:24
1

Sorry, I am unable to reconstruct my earlier train of thought. – Gloweye Jul 02 '19 at 14:58

Jab · Answer 3 · 2019-06-27T00:26:00.320

As others have stated; p[:] deletes all items in p; BUT will not affect q. To go into further detail the list docs refer to just this:

All slice operations return a new list containing the requested elements. This means that the following slice returns a new (shallow) copy of the list:
>>> squares = [1, 4, 9, 16, 25]
...
>>> squares[:]
[1, 4, 9, 16, 25]

So q=p[:] creates a (shallow) copy of p as a separate list but upon further inspection it does point to a completely separate location in memory.

>>> p = [1,2,3]
>>> q=p[:]
>>> id(q)
139646232329032
>>> id(p)
139646232627080

This is explained better in the copy module:

A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.

Although the del statement is performed recursively on lists/slices:

Deletion of a target list recursively deletes each target, from left to right.

So if we use del p[:] we are deleting the contents of p by iterating over each element, whereas q is not altered as stated earlier, it references a separate list although having the same items:

>>> del p[:]
>>> p
[]
>>> q
[1, 2, 3]

In fact this is also referenced in the list docs as well in the list.clear method:

list.copy()

Return a shallow copy of the list. Equivalent to a[:].

list.clear()

Remove all items from the list. Equivalent to del a[:].

The `del` statement isn't performed recursively on that one list. The documentation talks about the target list, not the one list+slice in the example. Consider `del parrot, spam, grail` – here the target list has three elements that are deleted left to right, first `parrot`, then `spam`, then `grail`. In `del p[:]` there is just one target that gets deleted: `p[:]`. — BlackJack, Jun 27 '19 at 13:25

score 6 · Answer 4 · edited Jun 20 '20 at 09:12

Basically the slice-syntax can be used in 3 different contexts:

Accessing, i.e. x = foo[:]
Setting, i.e. foo[:] = x
Deleting, i.e. del foo[:]

And in these contexts the values put in the square brackets just select the items. This is designed that the "slice" is used consistently in each of these cases:

So x = foo[:] gets all elements in foo and assigns them to x. This is basically a shallow copy.
But foo[:] = x will replace all elements in foo with the elements in x.
And when deleting del foo[:] will delete all elements in foo.

However this behavior is customizable as explained by 3.3.7. Emulating container types:

object.__getitem__(self, key)

Called to implement evaluation of self[key]. For sequence types, the accepted keys should be integers and slice objects. Note that the special interpretation of negative indexes (if the class wishes to emulate a sequence type) is up to the __getitem__() method. If key is of an inappropriate type, TypeError may be raised; if of a value outside the set of indexes for the sequence (after any special interpretation of negative values), IndexError should be raised. For mapping types, if key is missing (not in the container), KeyError should be raised.

Note

for loops expect that an IndexError will be raised for illegal indexes to allow proper detection of the end of the sequence.

object.__setitem__(self, key, value)

Called to implement assignment to self[key]. Same note as for __getitem__(). This should only be implemented for mappings if the objects support changes to the values for keys, or if new keys can be added, or for sequences if elements can be replaced. The same exceptions should be raised for improper key values as for the __getitem__() method.

object.__delitem__(self, key)

Called to implement deletion of self[key]. Same note as for __getitem__(). This should only be implemented for mappings if the objects support removal of keys, or for sequences if elements can be removed from the sequence. The same exceptions should be raised for improper key values as for the __getitem__() method.

(Emphasis mine)

So in theory any container type could implement this however it wants. However many container types follow the list-implementation.

score 2 · Answer 5 · answered Jun 26 '19 at 23:48

I'm not sure if you want this sort of answer. In words, for p[:], it means to "iterate through all elements of p". If you use it in

q=p[:]

Then it can be read as "iterate with all elements of p and set it to q". On the other hand, using

q=p

Just means, "assign the address of p to q" or "make q a pointer to p" which is confusing if you came from other languages that handles pointers individually.

Therefore, using it in del, like

del p[:]

Just means "delete all elements of p".

Hope this helps.

Draconis · Answer 6 · 2019-06-27T00:05:39.367

Historical reasons, mainly.

In early versions of Python, iterators and generators weren't really a thing. Most ways of working with sequences just returned lists: range(), for example, returned a fully-constructed list containing the numbers.

So it made sense for slices, when used on the right-hand side of an expression, to return a list. a[i:j:s] returned a new list containing selected elements from a. And so a[:] on the right-hand side of an assignment would return a new list containing all the elements of a, that is, a shallow copy: this was perfectly consistent at the time.

On the other hand, brackets on the left side of an expression always modified the original list: that was the precedent set by a[i] = d, and that precedent was followed by del a[i], and then by del a[i:j].

Time passed, and copying values and instantiating new lists all over the place was seen as unnecessary and expensive. Nowadays, range() returns a generator that produces each number only as it's requested, and iterating over a slice could potentially work the same way—but the idiom of copy = original[:] is too well-entrenched as a historical artifact.

In Numpy, by the way, this isn't the case: ref = original[:] will make a reference rather than a shallow copy, which is consistent with how del and assignment to arrays work.

>>> a = np.array([1,2,3,4])
>>> b = a[:]
>>> a[1] = 7
>>> b
array([1, 7, 3, 4])

Python 4, if it ever happens, may follow suit. It is, as you've observed, much more consistent with other behavior.

It still makes sense to create a copy for slices. I don't see why a Python 4 should follow suit. — BlackJack, Jun 27 '19 at 13:28
@BlackJack Iterating over other data structures generally doesn't create a copy: if you want a separate copy of a dictionary's keys, for example, you need to use `list(d.keys())`. It would make sense to me if you had to do the same for slices: `list(a[:])` or just `a.copy()`. — Draconis, Jun 27 '19 at 16:45
I don't get your point. Iterating over lists doesn't create a copy either. Also if `original[:]` will create a view or a copy isn't really defined in Numpy. Just that if you really need an independent copy you have to explicitly ask for one. Slicing in Numpy sometimes gives a view and sometimes gives a copy. For instance there are circumstances where a slice of a slice can't be represented with offset, dimensions, and strides as a view anymore. — BlackJack, Jun 27 '19 at 19:45
This "historical" reason is still as relevant in today's python as it was back then. The reason why numpy array can return a view is because the dimension of a numpy array is not mutable (array is not resizable). Python's lists is fully mutable, and having slicing returns a default makes the language much harder to correctly use for the common use cases and potentially slower due to the additional indirection. — Lie Ryan, Jun 27 '19 at 23:50

Why was p[:] designed to work differently in these two situations?

6 Answers6

`object.getitem(self, key)`

Note

`object.setitem(self, key, value)`

`object.delitem(self, key)`

Linked