how to efficiently exhaust an iterator in a oneliner?

Question

If i have an iterator it and want to exhaust it I can write:

for x in it:
    pass

Is there a builtin or standard library call which allows me to do it in a one-liner? Of course i could do:

list(it)

which will build a list from the iterator and then discard it. But i consider that inefficient because of the list-building step. It's of course trivial to write myself a helper function that does the empty for loop but i am curious if there is something else i am missing.

I'm not sure what you want, but does `[None for _ in it]` the job (or such a variant)? — Mathias711, Apr 21 '16 at 07:48
Assuming you're not doing this for side effects, what's the benefit of exhausting the iterator over just discarding it? — snakecharmerb, Apr 21 '16 at 07:55
it's for side-effects indeed. any solution that builds up a list is more inefficient than the solution ``collections.deque(it, maxlen=0)`` below. — hpk42, Apr 23 '16 at 06:25
Related: [https://stackoverflow.com/q/50937966/8746648](Fastest (most Pythonic) way to consume an iterator) — asynts, Sep 07 '22 at 08:37

score 26 · Accepted Answer · answered Apr 21 '16 at 07:50

26

From the itertools recipes:

    # feed the entire iterator into a zero-length deque
    collections.deque(iterator, maxlen=0)

answered Apr 21 '16 at 07:50

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

1

Also briefly explained here: http://code.activestate.com/lists/python-ideas/23364/ – Reblochon Masque Apr 21 '16 at 08:04
3

thanks -- i had tried google and stackoverflow search and not found the solution. It should be easier to discover now :) And ``collections.exhaust_iterator`` or ``itertools.exhaust_iterator`` would be nice and more obvious, though. – hpk42 Apr 23 '16 at 06:23

Kelly Bundy · Answer 2 · 2022-09-09T13:21:44.933

2022 update (bounty asks): There's no "dedicated function" for it in the standard library, and deque(it, 0) is still the most efficient. That's why it's used in itertools's consume recipe and more-itertools's consume function (click on [source] there).

Benchmark of the various proposals, iteration time per element, iterating itertools.repeat(None, 10**5) (with CPython 3.10):

  2.7 ns ± 0.1 ns consume_deque
  6.5 ns ± 0.0 ns consume_loop
  6.5 ns ± 0.0 ns consume_all_if_False
 13.9 ns ± 0.3 ns consume_object_in
 27.0 ns ± 0.1 ns consume_all_True
 29.4 ns ± 0.3 ns consume_sum_0
 44.8 ns ± 0.1 ns consume_reduce

The deque one wins due to being C and having a fast path for maxlen == 0 which does nothing with the elements.

The simple loop gets second place, fastest with Python iteration. The other solutions previously proposed here waste more or less time by doing more or less work with each element. I added consume_all_if_False to show how to do an all/sum efficiently: have an if False clause so your generator doesn't produce anything.

Benchmark code (Try it online!):

def consume_loop(it):
    for _ in it:
        pass

def consume_deque(it):
    deque(it, 0)

def consume_object_in(it):
    object() in it

def consume_all_True(it):
    all(True for _ in it)

def consume_all_if_False(it):
    all(_ for _ in it if False)

def consume_sum_0(it):
    sum(0 for _ in it)

def consume_reduce(it):
    reduce(lambda x, y: y, it)

funcs = [
    consume_loop,
    consume_deque,
    consume_object_in,
    consume_all_True,
    consume_all_if_False,
    consume_sum_0,
    consume_reduce,
]

from timeit import default_timer as timer
from itertools import repeat
from collections import deque
from functools import reduce
from random import shuffle
from statistics import mean, stdev

times = {f: [] for f in funcs}
def stats(f):
    ts = [t * 1e9 for t in sorted(times[f])[:5]]
    return f'{mean(ts):5.1f} ns ± {stdev(ts):3.1f} ns'

for _ in range(25):
  shuffle(funcs)
  for f in funcs:
    n = 10**5
    it = repeat(None, n)
    t0 = timer()
    f(it)
    t1 = timer()
    times[f].append((t1 - t0) / n)

for f in sorted(funcs, key=stats):
  print(stats(f), f.__name__)

As a supplement to "a fast path for this special case", CPython has specially written an [internal function](https://github.com/python/cpython/blob/30cc1901efa18180a83bf8402df9e1c10d877c49/Modules/_collectionsmodule.c#L365) for consuming iterator, which is [called](https://github.com/python/cpython/blob/main/Modules/_collectionsmodule.c#L399) in `deque.extend` (and `deque.extendleft`) when `maxlen` is 0. — Mechanic Pig, Sep 09 '22 at 11:03
@MechanicPig Alright, added links after all, thanks. Was too lazy last night. — Kelly Bundy, Sep 09 '22 at 13:23
Interestingly it was added [here](https://github.com/python/cpython/commit/060c7f6bbafdaeb4b73ce34f1bb34e4ac76f2d92). Another competitive solution is to use `itertools.islice` such as `for _ in islice(it, None, None, 1_000_000): pass`, which makes more sense to me in case they ever decide to remove the fast path for `deque`, which would honestly seem fair to me. — Simply Beautiful Art, Nov 25 '22 at 04:59

score 3 · Answer 3 · answered Dec 03 '20 at 10:28

3

Note that your suggestion can also be formulated as a one-liner:

for _ in it: pass

And I just made:

def exhaust(it):
    for _ in it:
        pass

It's not as fast as the deque solution (10% slower on my laptop), but I find it cleaner.

answered Dec 03 '20 at 10:28

Yuval

3,207
32
45

score 0 · Answer 4 · answered May 31 '18 at 10:12

object() in it

If you know the iterator will never produce a certain kind of object, you can also use that instead, e.g. None in it or () in it. The newly-created object() works pretty much universally, because it'll never test equal to anything else (barring shenanigans).

I'm not advocating this idiom; the for loop in the question is in many ways the best solution. But if you're looking for a creepily "elegant" answer in the sense that it does the minimum possible side-computation while still being a very neat one-liner (as opposed to e.g. any(False for _ in it)) then this may be it.

score -1 · Answer 5 · answered Apr 21 '16 at 07:51

-1

You could use sum:

sum(0 for _ in it)

or similarly, using reduce:

reduce(lambda x, y: y, it)

answered Apr 21 '16 at 07:51

tobias_k

81,265
12
120
179

According to my benchmark, these are the two slowest, really not doing it "efficiently" (granted, I measured with newer Python, but I believe they were inefficient back then already). – Kelly Bundy Sep 09 '22 at 02:41

Phil Kang · Answer 6 · 2019-02-07T11:18:39.073

-1

The built-in all() function should be extremely cheap and simple:

all(True for _ in it)

Edit: Fixed, thank you @hemflit !

edited Feb 07 '19 at 11:18

answered Jan 29 '19 at 10:49

Phil Kang

920
8
18

1

No, this will stop iterating on first falsy element. `all(True for _ in it)` would do it though. – hemflit Feb 06 '19 at 09:42

how to efficiently exhaust an iterator in a oneliner?

6 Answers6

Linked