0

I am a tutor for an intermediate Python course at a university and I recently had some students come to me for the following problem (code is supposed to add all the values in a list to a set):

mylist = [10, 20, 30, 40]

my_set = set()

(my_set.add(num) for num in mylist)

print(my_set)

Their output was:

set() 

Now, I realized their generator expression is the reason nothing is being added to the set, but I am unsure as to why.

I also realized that using a list comprehension rather than a generator expression:

[my_set.add(num) for num in mylist]

actually adds all the values to the set (although I realize this is memory inefficient as it involves allocating a list that is never used. The same could be done with just a for loop and no additional memory.).

My question is essentially why does the list comprehension add to the set, while the generator expression does not? Also would the generator expression be in-place, or would it allocate more memory?

uglygod
  • 61
  • 5
  • 2
    Both of these *are terribly unpythonic*. you should **never** use comprehension constructs for side effects. – juanpa.arrivillaga Jan 23 '20 at 01:48
  • 1
    Anyway, `(my_set.add(num) for num in mylist)` creates a generator, then nothing ever happens to the generator, so what would you *expect* it to do? – juanpa.arrivillaga Jan 23 '20 at 01:49
  • @juanpa.arrivillaga I am aware of that, which is why I recommended that the students use a normal `for` loop. I am just trying to understand the inner workings of the language. – uglygod Jan 23 '20 at 01:50
  • 1
    Does this answer your question? [Understanding generators in Python](https://stackoverflow.com/questions/1756096/understanding-generators-in-python) See the accepted answer to that question: *"Observe that a generator object is generated once, but its code is not run all at once. Only calls to next actually execute (part of) the code."* – kaya3 Jan 23 '20 at 01:50
  • 1
    Are you asking what generators are? I don't understand your question "Also would the generator expression be in-place, or would it allocate more memory?" I think this may just be a duplicate of that question as well – juanpa.arrivillaga Jan 23 '20 at 01:51
  • @juanpa.arrivillaga Ahh okay, so the generator would return the result of `my_set.add(num)` every time it is iterated over? – uglygod Jan 23 '20 at 01:51
  • @uglygod yes, generator objects are iterators, created using either a generator expression or a generator function (`def gen_func(): ... yield ...`) – juanpa.arrivillaga Jan 23 '20 at 01:52
  • Anyway, I added an additional duplicate target to a similar question that I had answered a long time ago where I explain this *particular* difference. – juanpa.arrivillaga Jan 23 '20 at 01:55
  • @kaya3 and juanpa.arrivillaga thank you. It seems as though I was just a bit rusty on how Python's generators work. – uglygod Jan 23 '20 at 01:55

2 Answers2

3

Generator expressions are lazy, if you don't actually iterate over them, they do nothing (aside from compute the value of the iterator for the outermost loop, e.g. in this case, doing work equivalent to iter(mylist) and storing the result for when the genexpr is actually iterated). To make it work, you'd have to run out the generator, e.g. using the consume itertools recipe:

consume(my_set.add(num) for num in mylist)

# Unoptimized equivalent:
for _ in (my_set.add(num) for num in mylist):
    pass

In any event, this is an insane thing to do; comprehensions and generator expressions are functional programming tools, and should not have side-effects, let alone be written solely for the purpose of producing side-effects. Code maintainers (reasonably) expect that comprehensions will trigger no "spooky action at a distance"; don't violate that expectation. Just use a set comprehension:

myset = {num for num in mylist}

or since the comprehension does nothing in this case, the set constructor:

myset = set(mylist)  # Or with modern unpacking generalizations, myset = {*mylist}
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
1

Your students (and yourself perhaps) are using comprehension expressions as shorthand for loops - that's a bad pattern.

The answer to your question is that the list comprehension needs to be evaluated immediately, as the results are needed to populate the list, while the generator expression is only evaluated as it's being used.

You're interested in the side effect of that evaluation, but if the side effect is really the main goal, the code should just be:

myset = set(mylist)
Grismar
  • 27,561
  • 4
  • 31
  • 54