17

I use python's function zip a lot in my code (mostly to create dicts like below)

dict(zip(list_a, list_b)) 

I find it really useful, but sometimes it frustrates me because I end up with a situation where list_a is a different length to list_b. zip just goes ahead and zips together the two lists until it achieves a zipped list that is the same length as the shorter list, ignoring the rest of the longer list. This seems like it should be treated as an error in most circumstances, which according to the zen of python should never pass silently.

Given that this is such an integral function, I'm curious as to why it's been designed this way? Why isn't it treated as an error if you try to zip together two lists of different lengths?

chris
  • 1,869
  • 4
  • 29
  • 52
  • 2
    [Here is the RFC proposing `zip` for Python](https://www.python.org/dev/peps/pep-0201/#lockstep-for-loops), explaining why this choice was made. – Akshat Mahajan Sep 22 '16 at 00:38
  • Because you somtimes want to `zip` iterables with different lengths and sometimes you use generators and you can't know the length beforehand. When would you throw the Error? As soon as one is exhausted but the other isn't? – MSeifert Sep 22 '16 at 00:40

3 Answers3

16

Reason 1: Historical Reason

zip allows unequal-length arguments because it was meant to improve upon map by allowing unequal-length arguments. This behavior is the reason zip exists at all.

Here's how you did zip before it existed:

>>> a = (1, 2, 3)
>>> b = (4, 5, 6)
>>> for i in map(None, a, b): print i
...
(1, 4)
(2, 5)
(3, 6)
>>> map(None, a, b)
[(1, 4), (2, 5), (3, 6)]

This is terribly unintuitive, and does not support unequal-length lists. This was a major design concern, which you can see plain-as-day in the official RFC proposing zip for the first time:

While the map() idiom is a common one in Python, it has several disadvantages:

  • It is non-obvious to programmers without a functional programming background.

  • The use of the magic None first argument is non-obvious.

  • It has arbitrary, often unintended, and inflexible semantics when the lists are not of the same length - the shorter sequences are padded with None :

    >>> c = (4, 5, 6, 7)

    >>> map(None, a, c)

    [(1, 4), (2, 5), (3, 6), (None, 7)]

So, no, this behaviour would not be treated as an error - it is why it was designed in the first place.


Reason 2: Practical Reason

Because it is pretty useful, is clearly specified and doesn't have to be thought of as an error at all.

By allowing unequal lengths, zip only requires that its arguments conform to the iterator protocol. This allows zip to be extended to generators, tuples, dictionary keys and literally anything in the world that implements __next__() and __iter__(), precisely because it doesn't inquire about length.

This is significant, because generators do not support len() and thus there is no way to check the length beforehand. Add a check for length, and you break zips ability to work on generators, when it should. That's a fairly serious disadvantage, wouldn't you agree?


Reason 3: By Fiat

Guido van Rossum wanted it this way:

Optional padding. An earlier version of this PEP proposed an optional pad keyword argument, which would be used when the argument sequences were not the same length. This is similar behavior to the map(None, ...) semantics except that the user would be able to specify pad object. This has been rejected by the BDFL in favor of always truncating to the shortest sequence, because of the KISS principle. If there's a true need, it is easier to add later. If it is not needed, it would still be impossible to delete it in the future.

KISS trumps everything.

Akshat Mahajan
  • 9,543
  • 4
  • 35
  • 44
10

With python 3.10 zip() gets a new, optional strict flag. When it is set and lists of unequal length are encountered, it will raise a ValueError. This is detailed in PEP 618, and mentioned in the changelog of 3.10

L_W
  • 942
  • 11
  • 18
-1

In my experience, the only reason that you would ever have two lists that happen to have the same length is because they were both constructed from the same source, e.g. they are maps of the same underlying source, they are constructed inside the same loop, etc. In these cases, rather than creating them separately and then zipping them, I usually just create a single pre-zipped list of tuples. Most of the times that I actually use zip, one of the iterables is infinite, and in these cases I'm glad that it lets me.

Curtis Lusmore
  • 1,822
  • 15
  • 16