1

What is the most pythonic way to execute a full generator comprehension where you don't care about the return values and instead the operations are purely side-effect-based?

An example would be splitting a list based on a predicate value as discussed here. It's natural to think of writing a generator comprehension

split_me = [0, 1, 2, None, 3, '']
a, b = [], []
gen_comp = (a.append(v) if v else b.append(v) for v in split_me)

In this case the best solution I can come up with is to use any

any(gen_comp)

However that's not immediately obvious what's happening for someone who hasn't seen this pattern. Is there a better way to cycle through that full comprehension without holding all the return values in memory?

kalefranz
  • 4,612
  • 2
  • 27
  • 42
  • if you do `any(gen_comp)` without assigning it to a variable, it is automatically marked for garbage-collection, but what you're doing is more of a hack, not a regular thing - use `for` loops instead. – Renae Lider Aug 31 '15 at 08:44
  • @RenaeLider Calling a function (for it's sideeffect) without storing the return value is perfectly OK (in many cases). – skyking Aug 31 '15 at 09:00
  • 1
    @skyking, it doesn't make sense to cram that into a comprehension though; better to just use a loop. – Cyphase Aug 31 '15 at 09:05
  • @Cyphase I meant the point about garbage collection is not the point (if you don't care about the return value then it's fine if it gets garbage collected). – skyking Aug 31 '15 at 09:07
  • 1
    @skyking [*"Particularly tricky is map() invoked for the side effects of the function; the correct transformation is to use a regular for loop (since creating a list would just be wasteful)."*](https://docs.python.org/3/whatsnew/3.0.html#views-and-iterators-instead-of-lists) – jonrsharpe Aug 31 '15 at 09:14
  • @skyking I was talking about `gc` because OP mentioned memory consumption in the question. So as long as they do not assign it to a variable, the interpreter will eventually delete it and free the memory. However, the variable named `gen_comp` is actually a generator _expression_, and iterating over it will not result in accumulating items like in a list/tuple/set/dict comprehension. Each value is yielded and discarded immediately. A generator expression always takes up the same amount of memory. – Renae Lider Aug 31 '15 at 09:48
  • `any(gen_exp)` is cute, but wasteful. A generator expression has more overhead than a simple `for` loop, since a gen exp has to create a new scope and set up the machinery to yield values. A gen exp is slightly more efficient than a generator created using `def`, since you avoid the overhead of a function call, but it still has to do those other two things. – PM 2Ring Aug 31 '15 at 10:34
  • (cont) FWIW, a list comp is slightly faster than doing `append` in a `for` loop because it uses a special `LIST_APPEND` bytecode, and thus avoids a method call. But apart from those things there's no "magical" benefit derived from using list comprehensions or generator expressions over equivalent code using "traditional" for loops. – PM 2Ring Aug 31 '15 at 10:34
  • @PM2Ring So if speed is the concern and `split_me` is a list (or at least not an iterator) one should maybe do it in two steps `a=[v for v in split_me if v]` and `b=[v for v in split_me if not v]`? – skyking Aug 31 '15 at 10:41
  • @skyking: No, that has to scan the list twice, testing each member twice, so code using a normal for loop _should_ beat it: the speed boost from using `LIST_APPEND` isn't huge. – PM 2Ring Aug 31 '15 at 11:00

6 Answers6

6

You do so by not using a generator expression.

Just write a proper loop:

for v in split_me:
    if v:
        a.append(v)
    else:
        b.append(v)

or perhaps:

for v in split_me:
    target = a if v else b
    target.append(v)

Using a generator expression here is pointless if you are going to execute the generator immediately anyway. Why produce an object plus a sequence of None return values when all you wanted was to append values to two other lists?

Using an explicit loop is both more comprehensible for future maintainers of the code (including you) and more efficient.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • If he wants to use constant memory (e.g. if `split_me` is enormous) then `split_me` should be fed out of a generator. But there's not enough detail to say whether or not that's useful. – Rob Grant Aug 31 '15 at 10:15
5

itertools has this consume recipe

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

in your case n is None, so:

collections.deque(iterator, maxlen=0)

Which is interesting, but also a lot of machinery for a simple task

Most people would just use a for loop

John La Rooy
  • 295,403
  • 53
  • 369
  • 502
3

There's nothing non-pythonic in writing things on many lines and make use of if-statements:

for v in split_me:
    if v:
        a.append(v)
    else:
        b.append(v)

If you want a one-liner you could do so by putting the loop on one line anyway:

for v in split_me: a.append(v) if v else b.append(v)

If you want it in an expression (which still beats me why you want unless you have a value you want to get out of it) you could use list comprehension to force looping:

[x for x in (a.append(v) if v else b.append(v) for v in split_me) if False]

Which solution do you think best shows what you're doing? I'd say the first solution. To be pythonic you should probably consider the zen of python, especially:

  • Readability counts.
  • If the implementation is hard to explain, it's a bad idea.
skyking
  • 13,817
  • 1
  • 35
  • 57
  • I'm a little uncomfortable with your one-liner, since it uses a conditional expression which produces the same result (None) in both branches, and then throws away the result; OTOH, using an expression for its side-effects isn't as bad as using a gen exp or list comp for its side-effects. An alternative one-liner is a condensed version of Martijn's last answer: `for v in split_me: (a if v else b).append(v), but I'd still prefer to see that on two lines. – PM 2Ring Aug 31 '15 at 10:25
  • 1
    I thought I was pretty clear - you **should** be (a little) uncomfortable with them. – skyking Aug 31 '15 at 10:34
3

As others have said, don't use comprehensions just for side-effects.

Here's a nice way to do what you're actually trying to do using the partition() recipe from itertools:

try:  # Python 3
    from itertools import filterfalse
except ImportError:  # Python 2
    from itertools import ifilterfalse as filterfalse
    from itertools import ifilter as filter


from itertools import tee


def partition(pred, iterable):
    'Use a predicate to partition entries into false entries and true entries'
    # From itertools recipes:
    # https://docs.python.org/3/library/itertools.html#itertools-recipes
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

split_me = [0, 1, 2, None, 3, '']

trueish, falseish = partition(lambda x: x, split_me)

# You can iterate directly over trueish and falseish,
# or you can put them into lists

trueish_list = list(trueish)
falseish_list = list(falseish)

print(trueish_list)
print(falseish_list)

Output:

[0, None, '']
[1, 2, 3]
Cyphase
  • 11,502
  • 2
  • 31
  • 32
0

Just to throw in another reason why using any() to consume a generator is a horrible idea, you need to remember that any() and all() are guaranteed to do short-circuit evaluation which means that if the generator ever returns a True value then all() will early-out on you and leave your generator incompletely consumed.

This is adding an extra conditional test / stop condition that you A) probably don't want, and B) may be far away from where the generator is created.

Many standard library functions return None so you could get away with all() for a while until suddenly it's not doing what you expect, and you might stare at that code for a long time before it occurs to you if you've gotten into the habit of using all() in this way.

If you must do something like this, then itertools.consume() is really the only reasonable way to do it I think.

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 03 '22 at 13:59
  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/33075643) – dodekja Nov 05 '22 at 08:30
0

any is short, but is not a general solution. Something which works for any generator is the straightforward

for _ in gen_comp: pass

which is also shorter and more efficient than a generally working any method,

any(None for _ in gen_comp)

so the for loop is really the clearest and best. Its only downside is that it cannot be used in expressions.

Felix Dombek
  • 13,664
  • 17
  • 79
  • 131