26

I have some use cases in which I need to run generator functions without caring about the yielded items.
I cannot make them non-generaor functions because in other use cases I certainly need the yielded values.

I am currently using a trivial self-made function to exhaust the generators.

def exhaust(generator):
     for _ in generator:
         pass

I wondered, whether there is a simpler way to do that, which I'm missing?

Edit Following a use case:

def create_tables(fail_silently=True):
    """Create the respective tables."""

    for model in MODELS:
        try:
            model.create_table(fail_silently=fail_silently)
        except Exception:
            yield (False, model)
        else:
            yield (True, model)

In some context, I care about the error and success values…

for success, table in create_tables():
    if success:
        print('Creation of table {} succeeded.'.format(table))
    else:
        print('Creation of table {} failed.'.format(table), file=stderr)

… and in some I just want to run the function "blindly":

exhaust(create_tables())
Richard Neumann
  • 2,986
  • 2
  • 25
  • 50
  • 5
    *Why?* What is the purpose of such functions? It smells like a broader design issue, let alone an XY problem. – DeepSpace Nov 23 '17 at 13:26
  • This seems pretty simple already, no? – Chris_Rands Nov 23 '17 at 13:27
  • Updated with actual use case. – Richard Neumann Nov 23 '17 at 13:30
  • 1
    You could simply use `list()` instead of `exhaust()`, if you don't mind the memory impact. – florisla Jul 11 '18 at 07:18
  • 4
    why not `all(generator)` (or `any(generator)`) assuming the return value can be expected to be always "true" (resp. false); in case both can happen, say "generator or true"...? – Max Jan 17 '20 at 02:58
  • @DeepSpace - one example is the generator returned by `Executor.map()` - when passed a function which does not return any value: in this case you still need to exhaust the returned generator in order to raise any exceptions that have occurred in the evaluation. – BeeOnRope Aug 10 '22 at 22:16

4 Answers4

29

Setting up a for loop for this could be relatively expensive, keeping in mind that a for loop in Python is fundamentally successive execution of simple assignment statements; you'll be executing n (number of items in generator) assignments, only to discard the assignment targets afterwards.

You can instead feed the generator to a zero length deque; consumes at C-speed and does not use up memory as with list and other callables that materialise iterators/generators:

from collections import deque

def exhaust(generator):
    deque(generator, maxlen=0)

Taken from the consume itertools recipe.

Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
  • Doesn't a `for` loop also run at C-speed and use the same memory? Maybe show some timings? – Chris_Rands Nov 23 '17 at 13:34
  • 1
    @Chris_Rands Yes it runs at C-speed, but not entirely since callbacks are made to Python, to repeat the loop until completion. Besides, the repeated assignment is just extra overhead. – Moses Koledoye Nov 23 '17 at 13:37
  • 2
    Using `deque` seems to be just a tiny bit faster than the plain loop proposed by the OP; with a `consume.generator` function which yields the first 1000 numbers, running `for _ in consume.generator(): pass` takes 71.8 usec per loop for me, `deque(consume.generator(), maxlen=0)` takes 67.4 (according to `timeit`). – Frerich Raabe Nov 23 '17 at 13:40
  • @FrerichRaabe Thanks for metrics. Try scaling up the numbers a little? – Moses Koledoye Nov 23 '17 at 13:41
  • 1
    @MosesKoledoye For consuming the first million numbers, a for loop needs `77.9` ms here, and `deque` needs `70.8`. I.e. `deque` appears to be a bit less than 10% faster here (it doesn't seem to scale better in my tests). – Frerich Raabe Nov 23 '17 at 13:43
  • 13
    Also, funny that whenever someone asks "how to do this simple thing in a pythonic way", everything turns into "how to do it in least microseconds per loop" – Kos Nov 23 '17 at 13:45
  • 4
    @Kos Indeed, especially since `for _ in generator: pass` is not only less magical than the `deque` solution, it's also one character shorter than `deque(generator, maxlen=0)`. :-) – Frerich Raabe Nov 23 '17 at 13:49
  • @FrerichRaabe let alone the char's (and time!) wasted for "from collections import deque"... (and for such 1-liners one might not even need a function... oops, I failed to make one of these "why...?, don't...!" comments I hate so much when I ask "does it exist? / it is possible...?" – Max Jan 17 '20 at 02:44
  • Thanks to all for the performance evalution of the different solutions. 10% may not look like a lot of improvement for short operation, but when the execution is over 18h, I will be super happy if I can gain 10% :-) – Skratt Aug 26 '23 at 17:47
8

One very simple and possibly efficient solution could be

def exhaust(generator): all(generator)

if we can assume that generator will always return True (as in your case where a tuple of 2 elements (success,table) is true even if success and table both are False), or: any(generator) if it will always return False, and in the "worst case", all(x or True for x in generator).

Being that short & simple, you might not even need a function for it!

Regarding the "why?" comment (I dislike these...): There are many cases where one may want to exhaust a generator. To cite just one, it's a way of doing a for loop as an expression, e.g., any(print(i,x) for i,x in enumerate(S)) - of course there are less trivial examples.

Max
  • 415
  • 5
  • 12
6

Based on your use case it's hard to imagine that there would be sufficiently many tables to create that you would need to consider performance.

Additionally, table creation is going to be much more expensive than iteration.

So the for loop that you already have would seem the simplest and most Pythonic solution - in this case.

mhawke
  • 84,695
  • 9
  • 117
  • 138
2

You could just have two functions that each do one thing and call the appropriate one at the appropriate time?

def create_table(model, fail_silently=True):
    """Create the table."""
    try:
        model.create_table(fail_silently=fail_silently)
    except Exception:
        return (False, model)
    else:
        return (True, model)

def create_tables(MODELS)
    for model in MODELS:
        create_table(model)
        
def iter_create_tables(MODELS)
   for model in MODELS:
       yield create_table(model)

When you care about the returned values do:

for success, table in iter_create_tables(MODELS):
    if success:
        print('Creation of table {} succeeded.'.format(table))
    else:
        print('Creation of table {} failed.'.format(table), file=stderr)

when you don't just do

create_tables(MODELS)
abrac
  • 521
  • 5
  • 12
JeffUK
  • 4,107
  • 2
  • 20
  • 34