31

One of my friends asked me about this piece of code:

array = [1, 8, 15]
gen = (x for x in array if array.count(x) > 0)
array = [2, 8, 22]
print(list(gen))

The output:

[8]

Where did the other elements go?

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
pickle rick
  • 477
  • 2
  • 8
  • 4
    Theory: The generator expression is evaluated up to the `for x in array` _but not further_ when it is created, _then_ `array` is redefined, _then_ the `count` check is evaluated on the new array. – tobias_k Oct 21 '21 at 10:24
  • 4
    Very interesting question. I am surprised that redefining the `array` has an impact on `gen`. Could lead to very hard-to-find bugs. – Niko Föhr Oct 21 '21 at 10:28
  • 2
    @tobias_k That seems clearly true, as experiments on the second array show – John Coleman Oct 21 '21 at 10:28
  • While this is a duplicate, I believe it is a very valuable signpost based on the significantly different title (for SEO purposes) and the difference in explanations offered here vs the duplicate target. The fact it got such a high score in less than a week lends itself to that point, as well (a higher score than the dupe target got after 3 years). – TylerH Oct 27 '21 at 14:24

2 Answers2

32

The answer is in the PEP of the generator expressions, in particular the session Early Binding vs Late biding:

After much discussion, it was decided that the first (outermost) for-expression should be evaluated immediately and that the remaining expressions be evaluated when the generator is executed.

So basically the array in:

x for x in array 

is evaluated using the original list [1, 8, 15] (i.e. immediately), while the other one:

if array.count(x) > 0

is evaluated when the generator is executed using:

print(list(gen))

at which point array refers to a new list [2, 8, 22]

user2357112
  • 260,549
  • 28
  • 431
  • 505
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • Thank you, 1 more question, so list(gen), because the count is > 0, should be all of the elements od [2, 8, 22] right? but it is only 8. – pickle rick Oct 21 '21 at 11:07
  • 5
    @picklerick No, because the **for-loop** expression is evaluated with the first values of array, basically your code is equivalent to `list(x for x in [1, 8, 15] if [2, 8, 22].count(x) > 0)` – Dani Mesejo Oct 21 '21 at 11:11
  • When the PEP says "the first (outermost) for-expression should be evaluated immediately", that's referring to `array`, not `x for x in array`. It's the expression the `for` iterates over. It doesn't make any sense to try to evaluate `x for x in` immediately. – user2357112 Oct 21 '21 at 20:42
  • 1
    Adding some print calls in the generator expression illuminates what's going on. [Live demo](https://sagecell.sagemath.org/?z=eJxLLCpKrFSwVYg21FGw0FEwNI3lSk_NAwpoVCik5RcpVChk5iloFBRl5pVoJILUaioARaGszDSYVIWOApqsXnJ-KUhCU1PBTsFAkwuiDmi2JlcizE4jsJ1GRrFQ2ZzMYogSTQDI0i6u&lang=python) – PM 2Ring Oct 22 '21 at 04:08
  • `(x for x in iterable if )` can be rewritten as `tmp = iterable; (x for x in tmp if )` where `` *remains exactly the same as before* (and therefore does not have access to `tmp`). – GACy20 Oct 22 '21 at 07:04
12

This becomes more clear if you give each array a unique name instead of re-binding array:

array1 = [1, 8, 15]
gen = (x for x in array1 if array2.count(x) > 0)
array2 = [2, 8, 22]
print(list(gen))

x for x in array1 is evaluated at creation of the generator, but if array2.count(x) > 0 is evaluated lazily, which is why you can already reference a yet undefined variable

Felk
  • 7,720
  • 2
  • 35
  • 65