How to get the n next values of a generator into a list

Question

I have made a generator to read a file word by word and it works nicely.

def word_reader(file):
    for line in open(file):
        for p in line.split():
            yield p

reader = word_reader('txtfile')
next(reader)

What is the easiest way of getting the n next values into a list?

Looks like a dupe of http://stackoverflow.com/q/5234090/1709587; I haven't flagged because I need to look carefully and decide which one to close. Probably close this one. — Mark Amery, Oct 20 '15 at 12:29

score 97 · Accepted Answer · edited Feb 13 '23 at 19:57

97

Use itertools.islice:

list(itertools.islice(it, n))

edited Feb 13 '23 at 19:57

wjandrea

28,235
9
60
81

answered Nov 11 '10 at 08:04

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

7

An easy way to think about the arguments of `islice()` is that they exactly mirror the arguments of `range()`: `islice([start,] stop[, step])` (with the limitation that step > 0) – Beni Cherniavsky-Paskin Nov 12 '10 at 10:18
3

@BeniCherniavsky-Paskin: Although there is one quirk, in that `stop` can be explicitly `None`, which means the `islice` object itself will never stop iterating unless the underlying iterable stops. In that use case, you're trying to skip elements (initial elements for `start`, `step-1` elements between yields for `step > 1`), not truncate the input once you've gotten far enough. `range` doesn't accept `None` as a `stop` value (`itertools.count` fills that niche), so the abstraction using `range` is just a titch leaky. – ShadowRanger Sep 17 '18 at 18:52

score 19 · Answer 2 · edited Jan 27 '22 at 10:43

TL;DR: Use itertools.islice.

Originally I wrote another answer, that turned out to be a bad idea:

[next(it) for _ in range(n)]

This crashes when it yields less than n values, and this behaviour depends on subtle issues, so people reading such code are unlikely to understand it's precise semantics.

What happens if `next(it)` was exhausted and raises `StopIteration`?

(i.e. when it had less than n values to yield)

When I wrote the above line a couple years ago, I probably thought a StopIteration will have the clever side effect of cleanly terminating the list comprehension. But no, the whole comprehension will crash passing the StopIteration upwards. (It'd exit cleanly only if the exception originated from the range(n) iterator.)

Which is probably not the behavior you want.

But it gets worse. The following is supposed to be equivalent to the list comprehension (especially on Python 3):

list(next(it) for _ in range(n))

It isn't. The inner part is shorthand for a generator function; list() knows it's done when it raises StopIteration anywhere.
=> This version copes safely when there aren't n values and returns a shorter list. (Like itertools.islice().)

[Executions on: 2.7, 3.4]

But that's too going to change! The fact a generator silently exits when any code inside it raises StopIteration is a known wart, addressed by PEP 479. From Python 3.7 (or 3.5 with a future import) that's going to cause a RuntimeError instead of cleanly finishing the generator. I.e. it'll become similar to the list comprehension's behaviour. (Tested on a recent HEAD build)

Yes, also nice. I think the `islice` solution is a bit nicer, so I will accept that one. — Peter Smit, Nov 11 '10 at 09:02
Of course this answer is much nicer, because it is simpler, needs no extra module to import, has less parentheses... Maybe in Python 4 slicing returns generators by default (compare to map in Py3). I'd only change `i` to `_`, to not have "unused variable" warnings in some IDEs ;). BTW, in Haskell it's called `take N`, which is a perfect function. — Tomasz Gandor, Dec 14 '14 at 19:21
Except if n is larger then the generator's length you will get a StopIteration and a none defined variable. — xApple, Aug 07 '15 at 17:28
@xApple oops, you're right! And it's confusingly different if written as list(genartor expr.). Edited to explain this, upvoted `islice`. — Beni Cherniavsky-Paskin, Aug 10 '15 at 23:20
If you don't mind spurious values, you can use the default arg of the `next` function and call, for example `[next(it, None) for _ in range(n)]` — dafinguzman, Oct 10 '19 at 18:41

score 4 · Answer 3 · answered Nov 11 '10 at 08:59

4

for word, i in zip(word_reader(file), xrange(n)):
    ...

answered Nov 11 '10 at 08:59

dan_waterworth

6,261
1
30
41

1

This is bad, because it consumes and extra element from the generator. Beni's answer doesn't do that. – Tomasz Gandor Dec 14 '14 at 19:15
2

This one-off is avoided if you do `for i, word in zip(xrange(n), word_reader(file)):`. Though I'd prefer a reliable bug over such fragile order-dependent "fix" :-) – Beni Cherniavsky-Paskin Dec 14 '14 at 21:56
Still this seems the simplest using only primitives. – gatopeich Feb 28 '19 at 21:42

score 4 · Answer 4 · answered Jan 03 '18 at 14:11

4

To get the first n values of a generator, you can use more_itertools.take.

If you plan to iterate over the words in chunks (eg. 100 at a time), you can use more_itertools.chunked (https://more-itertools.readthedocs.io/en/latest/api.html):

import more_itertools
for words in more_itertools.chunked(reader, n=100):
    # process 100 words

answered Jan 03 '18 at 14:11

JustAC0der

2,871
3
32
35

5

I looked at the source code of `take` in more_itertools and to me it seems that the definition of `take` is just `list(islice(iterable, n))`. This, if you don't want to install a separate package for this, there should be no disadvantage to using the `islice` solution. – jochen Feb 05 '19 at 22:21

score 0 · Answer 5 · answered Mar 17 '19 at 20:09

0

Use cytoolz.take.

>>> from cytoolz import take
>>> list(take(2, [10, 20, 30, 40, 50]))
[10, 20]

answered Mar 17 '19 at 20:09

W.P. McNeill

16,336
12
75
111

How to get the n next values of a generator into a list

5 Answers5

What happens if `next(it)` was exhausted and raises `StopIteration`?

Linked

Related

How to get the n next values of a generator into a list

5 Answers5

What happens if next(it) was exhausted and raises StopIteration?

Linked

Related

What happens if `next(it)` was exhausted and raises `StopIteration`?