1

I tried this code at the REPL, in 3.8:

>>> a = list(range(10))
>>> a[:] = (i for i in a for _ in range(2))

We are assigning to elements of a based on elements from a generator, and that generator is iterating over a, and we don't even have a one-to-one correspondence of elements. That seems an awful lot like modifying the list while iterating over it, so I expected that this would go poorly in one way or another.

But instead, it works exactly according to naive expectation:

>>> a
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]

After a moment's thought, it seems that Python has to make some kind of temporary copy before actually doing the assignment. After all, the inserted slice could be a different size from the replaced slice (as long as it isn't an extended slice), which would require shifting elements from after the slice; and there's no way to know how far to shift them without evaluating the generator.

However, it's easy to imagine an implementation of that which would still encounter a problem. For example: copy elements after the slice to a temporary; mark from the beginning of the slice onwards as unused; append elements from the generator per the usual .append logic; finally .extend with the temporary. (Of course, that wouldn't work for extended slices, but extended slices can't resize the list anyway.) With that implementation, our example would hit an IndexError immediately, because the list would be cleared before the generator even starts being used.


So: is the actual behaviour reliable/guaranteed? Is it version-specific? How exactly does Python implement the slice assignment?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • 1
    Its reliable for lists, but classes choose how to implement assignment to a slice and may do something different. – tdelaney Aug 18 '22 at 21:08
  • @tdelaney Is it reliable because it's documented/guaranteed somewhere, or just because the Python people wouldn't dare to change it because it would certainly break some code? Only thing I found in the docs is "[Finally, the sequence object is asked to replace the slice with the items of the assigned sequence](https://docs.python.org/3/reference/simple_stmts.html#assignment-statements)" (meaning Python the language doesn't care, it's entirely up to the target object how to handle it.) – Kelly Bundy Aug 18 '22 at 21:13
  • @KellyBundy From the docs, [Mutable sequence types](https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types): `s[i:j] = t`: _slice of s from i to j is replaced by the contents of the iterable t_. `list` is behaving as its documented that it should. But this isn't enforced by the language itself, so its not guaranteed for all classes out there in the wild. – tdelaney Aug 18 '22 at 21:30
  • 2
    @tdelaney D'oh, I had looked at the wrong table. Anyway, I still don't think that guarantees this behaviour, i.e., first building a temp list of `t` and then *afterwards* assigning that. About `s += t`, the doc there similarly says *"extends s with the contents of t"*, but in that case [doesn't](https://stackoverflow.com/q/72155476/12671057) work like with such a temp list. – Kelly Bundy Aug 19 '22 at 07:39
  • Ah, so it's `.extend` that's working differently from slice assignment, then? I tried `powers = [1]; powers[1:] = map(operator.mul, [2] * 10, powers)` and `powers` is just `[1, 2]` - the first added element isn't picked up in the next iteration of the `map` generator. – Karl Knechtel Aug 19 '22 at 08:12
  • `.extend` and `+=`, yes. The full quote is actually **"extends s with the contents of t (for the most part the same as s[len(s):len(s)] = t)"**, so this difference in behavior seems to be a reason it says **"for the most part"**. – Kelly Bundy Aug 19 '22 at 08:31
  • I don't think this is guaranteed by the spec, but I don't think it's something that would ever change. `extend()` can be rewritten as both `a[len(a):] = b` and `a+= b`. There seems to be a very strong preference that `a += b` have virtually the same effect as `a = a + b`. Indeed, the language will do that conversion if no `__iadd__` method is defined for `a`. Operators on tuples and list should have the same *basic* effect. So after `a+=b` , a should have the same content if it is a list or tuple. And from there, `extend()`, `+=` and `a[len(a):]=b` should all behave the same way. – Dunes Aug 19 '22 at 10:25
  • @Dunes But they *don't* all behave the same way. Did you look at the question I linked to? – Kelly Bundy Aug 19 '22 at 10:35
  • @KellyBundy That's a bug. When `extend()` and `+=` have a direct reference to the list, then they will behave as the slice assignment does. Given, `l = [0]`, compare `l += l` with `l += iter(l)`. The former behaves like the slice assignment, and then second gives a memory error. I think that makes it obvious what is the intended behaviour, and which is the edge case that hasn't been covered properly. – Dunes Aug 19 '22 at 13:54
  • @Dunes Hmm, so I have you claiming it's a bug and *Raymond Hettinger* telling me *"I do think it is the expected behavior"*. I know who I'm going to believe... – Kelly Bundy Aug 19 '22 at 14:03
  • @KellyBundy Raymond Hettinger is talking about the behaviour of iterators. This is about the behaviour of `extend()` and `+=`. This is UB, because what constitutes the "contents" of the iterator depends on an implementation detail of `extend()` / `+=`. However, I think the behaviour of `l[len(l):]=l` and the special-casing of `l+=l` makes the intention clear. Note that he uses the for loop, not `extend()` for the guaranteed behaviour. – Dunes Aug 20 '22 at 14:42
  • @Dunes Raymond was talking about the behaviour I asked about, i.e., the behaviour of `+=` on a list with an iterator. And the special-casing of `l+=l` rather suggests the opposite, that with iterators (or *anything* other than the same list object, really) it's intended to behave as it does. Otherwise they'd just handle everything like `l+=l`. Would most likely be simpler/less code. But they explicitly made the choice to only do that for that special case. – Kelly Bundy Aug 20 '22 at 14:56
  • Perhaps, I haven't made myself clear. But it's the UB that makes it a bug. C and C++ are okay with UB in the language spec. You just shouldn't rely on whatever the compiler happens to produce. Whereas Python accepts documentation updates for missing or misleading documentation, including undefined behaviour. This is especially important as CPython is provided as a *reference* implementation, not just an implementation of the spec. – Dunes Aug 20 '22 at 14:57

1 Answers1

0

I'm pretty sure I can at least verify that either a temporary list (or tuple, I suppose) of the generator's contents gets made, or the result of the assignment is computed in a temporary buffer and then swapped back into the list object. Either way, this should make the observed behaviour in the question guaranteed.

Observe what happens if we try the same thing with an extended slice:

>>> a[::2] = (0 for _ in range(10))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: attempt to assign sequence of size 10 to extended slice of size 5
>>> a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The exception reports the length of the would-be assigned slice, so Python must have determined this by evaluating the entire generator. It could conceivably have assigned from the generator, found that there are extra elements, and then consumed the generator to determine the total length for the error message; but if it worked that way, then a would have been modified, and it isn't.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153