2

When I append two generators to a list using a loop the first generator duplicates the second generator's output. When I unroll the loop I get different output as I expected.

The following code demonstrates the issue.

import itertools

iterators = itertools.tee(itertools.repeat(('a', 0), 5), 2)
result = []
result.append(r[0] for r in iterators[0])
result.append(r[1] for r in iterators[1])

# As expected
print('Written out...')
print(list(result[0])) # ['a', 'a', 'a', 'a', 'a']
print(list(result[1])) # [0, 0, 0, 0, 0]


# Now do it again but use a loop
iterators = itertools.tee(itertools.repeat(('a', 0), 5), 2)
result = []
for index in [0, 1]:
    result.append(r[index] for r in iterators[index])

# This time both lists are of the second item.
print('With a loop...')
print(list(result[0])) # [0, 0, 0, 0, 0] <--- Huh?!
print(list(result[1])) # [0, 0, 0, 0, 0]

Why does the loop version not work as I expected? What can I do about it?


Solution

Now this is closed as a duplicate I can't post another answer, but for the record, here is the solution I used finally.

The problem as pointed out by @MikeMüller is that the instance of index that indexes r is late-bound. The following forces early-binding by making a new local variable instance i for each value of index in the loop:

for index, it in enumerate(iterators):
    g = lambda i: (r[i] for r in it) # force early binding on index
    result.append(g(index))

(I also liked Mike's suggestion to use generators all the way down, but unfortuantely I need the outer generator (for result) to be materialised so I can refer repeatedly to the individual elements in result. But list(result) has the same behaviour as my original loop code.)

Community
  • 1
  • 1
Ian Goldby
  • 5,609
  • 1
  • 45
  • 81
  • What output are you trying to get? – Patrick Haugh Oct 17 '17 at 15:31
  • 1
    You only have one `index` variable. Both generators use it. Same as the JavaScript thing, I’ll try to find a Python duplicate: https://stackoverflow.com/questions/750486/javascript-closure-inside-loops-simple-practical-example – Ry- Oct 17 '17 at 15:33
  • Remember the generator doesn't run until you call `list` and you're capturing `index`. Stick the print inside the loop and see what happens. – pvg Oct 17 '17 at 15:36

1 Answers1

2

Fix

You need to consume your iterators earlier:

for index in [0, 1]:
    result.append(list(r[index] for r in iterators[index]))

to get the equivalent effect.

Now:

print('With a loop...')
print(result[0])
print(result[1])

Output:

With a loop...
['a', 'a', 'a', 'a', 'a']
[0, 0, 0, 0, 0]

To illustrate this set index = 0 after the loop:

iterators = itertools.tee(itertools.repeat(('a', 0), 5), 2)
result = []
for index in [0, 1]:
    result.append(r[index] for r in iterators[index])
index = 0
print('With a loop...')
print(list(result[0]))
print(list(result[1]))

Now the first part of tee is used twice because r[index] always meàns r[0]:

With a loop...
['a', 'a', 'a', 'a', 'a']
['a', 'a', 'a', 'a', 'a']

Reason

The index is applied lazily, i.e. when you actually convert to a list. Since the index is 1 after the loop, it uses this 1 twice in r[index] and you get the second item in the iterator twice.

Alternative

Use iterators all the way till consumption:

iterators = itertools.tee(itertools.repeat(('a', 0), 5), 2)
result = ((r[index] for r in iterators[index]) for index in [0, 1])
for res in result:
    print(list(res))

Output:

['a', 'a', 'a', 'a', 'a']
[0, 0, 0, 0, 0]
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
  • This is also the reason why if `append` is replaced with `extend` it works great: It consumes the iterators on time. – raratiru Oct 17 '17 at 15:49
  • My first guess was that `index` was being closed over, so it was taking on its last value (1) for both generators. So I tried setting index = -1 after the loop expecting to get an index out of range exception. But I didn't and nothing was changed. (I realise now this is because -1 means the last element, which in context is the same as 1.) – Ian Goldby Oct 17 '17 at 19:07
  • Thanks. Your 'alternative' is the right answer as far as I am concerned. If consuming the iterators earlier was an option then the obvious solution would be to turn the initial iterator of tuples into a list and extract the individual elements with a couple of simple list comprehensions. Far simpler, but no good if the entire sequence won't fit in memory. – Ian Goldby Oct 17 '17 at 19:19
  • Another interesting thing is that while the first `index` in the generator expression is evaluated lazily the second one is [evaluated immediately](https://www.python.org/dev/peps/pep-0289/#early-binding-versus-late-binding) (linked from 1st duplicate), which explains why I didn't see the generator exhausted after the first result.. You really have to know the details to use this stuff safely! :-) – Ian Goldby Oct 17 '17 at 19:29
  • Yes, `iterators[index]` is evaluate immediately but `r[index]` is evaluated lazily. This is because `r[index]` is the "loop variable" and is evaluated potentially multiple times whereas `iterators[index]` is evaluated only once. – Mike Müller Oct 17 '17 at 19:39