When and why does the interpreter unravel by assuming same length sublists?

Question

I'm impressed by and enjoy the fact that a simple Python for statement can easily unravel a list of lists, without the need for numpy.unravel or an equivalent flatten function. However, the trade-off is now that I can't access elements of a list like this:

for a,b,c in [[5],[6],[7]]:
     print(str(a),str(b),str(c))
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected 3, got 1)

and instead, this works, up until the length-1 [5]:

for a,b,c in [[1,2,3],[4,5,6],[7,8,9],[0,0,0], [5]]:
     print(a,b,c)

1 2 3
4 5 6
7 8 9
0 0 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not enough values to unpack (expected 3, got 1)

Logically, it doesn't make sense to assume that a list would have a fixed number of elements. How come then, Python allows us to assume that a list of lists would always have the same number of elements?

I'd like to be aware of what Python expects, because I want to anticipate wrongly formatted lists/sublists.

I've poked around Python documentation and Stackoverflow, but haven't found the reasoning or how the interpreter is doing this.

My guess is that flattening same-length arrays is such a common occurrence (e.g. machine learning dimensionality reduction, matrix transformations, etc.), that there's utility in providing this feature at the trade-off of being unable to do what I've tried above.

`for a,b,c in [[5],[6],[7]]:` has _absolutely nothing_ to do with numpy. That's a Python list. Nor does `for a,b,c in [[1,2,3],[4,5,6],[7,8,9],[0,0,0], [5]]:` — roganjosh, Mar 15 '19 at 20:42
First, you aren't dealing with a `numpy` behavior. This is basic Python iteration. Secondly, you appear to be confusing two items - the `for` iteration, and the `a,b,c` unpacking. Unpacking is inflexible when it comes to the number items it expects, in this case 3 (one value for each variable). Also it doesn't let you assume anything - it raises a runtime `ValueError` if you get it wrong. (this mismatch isn't a syntax error). — hpaulj, Mar 15 '19 at 20:43
Python doesn’t assume anything. It lets you unpack any *iterable*. It’s up to you to ensure your iterables have the expected number of items. — deceze, Mar 15 '19 at 20:45
"How come then, Python allows us to assume that a list of lists would always have the same number of elements?" - same reason it lets you assume a list has at least 3 elements when you do `l[2]`, or why it lets you assume every element of a list is a number when you write `for x in l: s += x`. Why wouldn't it let you? — user2357112, Mar 15 '19 at 20:54
There are ways to "unravel" sequences with unequal sub-sequences. See the question [zip_longest without fillvalue](https://stackoverflow.com/questions/38054593/zip-longest-without-fillvalue). — martineau, Mar 15 '19 at 22:40
I'm sorry if my question was unclear. I wasn't assuming this functionality had any Numpy reference. I was just trying to draw parallels to the functionality of Numpy's np.unravel, as a guess to what the interpreter is doing. Thanks @martineau for clarifying the question. — Dave Liu, Mar 18 '19 at 17:47

wim · Answer 1 · 2019-03-15T21:16:54.583

The interpreter always assumes the length is matching when making an unpacking assignment, and just crashes with ValueError if it doesn't match. A for-loop is actually very similar to a kind of "repeated assignment statement", with the LHS being the free variable(s) of the loop and the RHS being an iterable container yielding the successive value(s) to use in each step of the iteration.

One assignment per iteration, made at the beginning of the loop body - in your case, it's an unpacking assignment, which binds multiple names.

So, in order to be properly equivalent to the second example, your first example which was:

for a,b,c in [[5],[6],[7]]:
    ...

should have been written instead:

for a, in [[5],[6],[7]]:
    ...

There is no "anticipation", and there can't be because (in the general case) you may be iterating over anything, e.g. data streaming in from a socket.

In order to fully grasp how for-loop flow works, the analogy with assignment statements is very useful. Anything that you can use on the left hand side of an assignment statement, you can use as the target in a for-loop. For example, this is equivalent to setting d[1] = 2 etc in a dict - and should make same result as dict(RHS):

>>> d = {}
>>> for k, d[k] in [[1, 2], [3, 4]]: 
...     pass 
...
>>> d
{1: 2, 3: 4}

It's just a bunch of assignments, in a well-defined order.

Not quite; it would need to be: `for [a],[b],[c] in [[[5],[6],[7]]]:` (note extra brackets on thing being iterated). Otherwise it would be trying to unpack `[5]` to `[a],[b],[c]`. — ShadowRanger, Mar 15 '19 at 20:48
Your post-edit approach also works :-). As does `for [a] in [[5],[6],[7]]:`. I'll stop before I get into [the many ways of unpacking single element lists...](https://stackoverflow.com/a/33161467/364696) :-) — ShadowRanger, Mar 15 '19 at 20:53

ShadowRanger · Answer 2 · 2019-05-23T02:03:03.413

Python doesn't know, you just told it to expect three elements by unpacking to three names. The ValueError says "you told us three, but we found a sub-iterable that didn't have three elements, and we don't know what to do".

Python isn't really doing anything special to implement this; aside from special cases for built-in types like tuple (and probably list), the implementation is just to iterate the sub-iterable the expected number of times and dump all the values found on the interpreter stack, then store them to the provided names. It also tries to iterate one more time (expecting StopIteration) so you don't silently ignore extra values.

For limited cases, you can be flexible by having one of the unpack names preceded with a *, so you capture all the "didn't fit" elements into that name (as a list). That lets you set a minimum number of elements while allowing more, e.g. if you really only need the first element from your second example, you could do:

for a, *_ in [[1,2,3],[4,5,6],[7,8,9],[0,0,0], [5]]:
    print(a,b,c)

where _ is just a name that, by convention, means "I don't actually care about this value, but I needed a placeholder name".

Another example would be when you want the first and last element, but otherwise don't care about the middle:

for first, *middle, last in myiterable:
    ...

But otherwise, if you need to handle variable length iterables, don't unpack, just store to a single name and iterate that name manually in whatever way makes sense to your program logic.

Alexandru Martin · Accepted Answer · 2019-03-18T19:31:01.477

3

Python does not assume same length lists because this is not only for lists.

When you iterate for a,b,c in [[1,2,3],[4,5,6],[7,8,9],[0,0,0], [5]] what is happening is that python returns a iterator that will iterate(return) each list values.

So that for is equivalent with:

l = [[1,2,3],[4,5,6],[7,8,9],[0,0,0], [5]]

l_iter = iter(l)

a,b,c = next(l_iter)

next(l_iter) will return each element from the list until it will raise a StopIteration execption according to the python iteration protocol.

This means:

a,b,c = [1,2,3]
a,b,c = [4,5,6]
a,b,c = [7,8,9]
a,b,c = [0,0,0]
a,b,c = [5]

As you can see now python can't unpack [5] into a,b,c as there is only one value.

edited Mar 18 '19 at 19:31

answered Mar 15 '19 at 20:49

Alexandru Martin

236
1
8

3

I'd suggest illustrating with `next(l_iter)`, not `l_iter.next()`; the latter is wrong in Python 3 (where the name is `__next__`), while the top level `next()` built-in function works on 2.7 and 3.x (and is the generally approved approach, in the same way `len(seq)` is preferred over `seq.__len__()`, even though technically both work). – ShadowRanger Mar 15 '19 at 20:50

When and why does the interpreter unravel by assuming same length sublists?

3 Answers3