4

I am having trouble understanding working of the zip() function in python when an iterator is passed in instead of iterable.

Have a look at these two print statements:

string = "ABCDEFGHI"

print(list(zip(*[iter(string)] * 3)))
# Output: [('A', 'B', 'C'), ('D', 'E', 'F'), ('G', 'H', 'I')]

print(list(zip(*[string] * 3)))
# Output: [('A', 'A', 'A'), ('B', 'B', 'B'), ('C', 'C', 'C'), ('D', 'D', 'D'), ('E', 'E', 'E'), ('F', 'F', 'F'), ('G', 'G', 'G'), ('H', 'H', 'H'), ('I', 'I', 'I')]

Can someone explain me the working of zip() in both the cases?

Yudhishthir Singh
  • 2,941
  • 2
  • 23
  • 42
  • 1
    The problem is, although you use `* 3` and want to get `[iter(string), iter(string), iter(string)]`, actually these three `iter(string)` point to the same object. Which means iter one will also affect two others. Um... a little bit confusing, hope you understand what I mean. While `[string] * 3` will result in true three different string. – Sraw May 29 '20 at 02:32
  • That's why we cannot create nested list just by `[[]] * 3` :) – Sraw May 29 '20 at 02:34

3 Answers3

4

The difference is that for [iter(string)] * 3, zip creates aliases of a single iterator. For [string] * 3, zip creates unique iterators per argument. The shorter output without duplicates is zip exhausting the single aliased iterator.

See what is meaning of [iter(list)]*2 in python? for more details on how [iter(...)] * 2 works and causes potentially unexpected results.

See the canonical answer List of lists changes reflected across sublists unexpectedly if the [...] * 3 aliasing behavior is surprising.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • The string is not copied; as you said initially, it's aliased. The difference is that in order to iterate, Python will create a separate iterator for each of the `zip` arguments when they're strings, but reuse the passed-in iterator when it's passed an iterator. – Karl Knechtel May 29 '20 at 02:35
  • Updated to be a bit clearer about what's going on. Thanks for the suggestion. – ggorlen May 29 '20 at 02:38
  • Great explanation, thanks for the additional content. – Yudhishthir Singh May 29 '20 at 12:46
3

Let's use a clearer example:

a = iter("123456")  # One iterator 
list(zip(a, a, a)) 
# [('1', '2', '3'), ('4', '5', '6')]

vs

a = iter("123456")
b = iter("123456")
c = iter("123456")
list(zip(a, b, c))
# [('1', '1', '1'), ('2', '2', '2'), ('3', '3', '3'), ('4', '4', '4'), ('5', '5', '5'), ('6', '6', '6')]

Obviously in the first example a can only yield 6 elements, and has to yield 3 to zip whenever zip needs to create a value. In contrast, the second example has 18 elements total, and yields them in 6 groups of 3.

Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96
2

To understand this properly, first make sure you understand the relevant fundamentals: the Python iterator protocol and object identity versus equality.

print(list(zip(*[string] * 3)))

We compute the list [string] * 3, and pass each element from that list to zip, so it's the same as if we had written zip(string, string, string). Python implicitly creates a separate iterator for each of the arguments (even though it's the same string object in each case - not copies!), and each one iterates over the same string. The first time through, each iterator finds the 'A', and so on.

print(list(zip(*[iter(string)] * 3)))

Now we have made a single iterator for the string, and passed it to zip in three argument positions. Python's iteration logic attempts to "create an iterator" for that iterator in each case; but the iterator of an iterator is itself (not a copy of itself!). So once the iteration begins, zip will be using the same iterator object three times to create each of its outputs. And each time it grabs the next element from the iterator, it will be affected by the previous attempts. So the first time through, zip requests three elements from that same iterator, and gets 'A', 'B' and 'C'.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153