73

I have made a generator to read a file word by word and it works nicely.

def word_reader(file):
    for line in open(file):
        for p in line.split():
            yield p

reader = word_reader('txtfile')
next(reader)

What is the easiest way of getting the n next values into a list?

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Peter Smit
  • 27,696
  • 33
  • 111
  • 170
  • 2
    Looks like a dupe of http://stackoverflow.com/q/5234090/1709587; I haven't flagged because I need to look carefully and decide which one to close. Probably close this one. – Mark Amery Oct 20 '15 at 12:29

5 Answers5

97

Use itertools.islice:

list(itertools.islice(it, n))
wjandrea
  • 28,235
  • 9
  • 60
  • 81
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 7
    An easy way to think about the arguments of `islice()` is that they exactly mirror the arguments of `range()`: `islice([start,] stop[, step])` (with the limitation that step > 0) – Beni Cherniavsky-Paskin Nov 12 '10 at 10:18
  • 3
    @BeniCherniavsky-Paskin: Although there is one quirk, in that `stop` can be explicitly `None`, which means the `islice` object itself will never stop iterating unless the underlying iterable stops. In that use case, you're trying to skip elements (initial elements for `start`, `step-1` elements between yields for `step > 1`), not truncate the input once you've gotten far enough. `range` doesn't accept `None` as a `stop` value (`itertools.count` fills that niche), so the abstraction using `range` is just a titch leaky. – ShadowRanger Sep 17 '18 at 18:52
19

TL;DR: Use itertools.islice.

Originally I wrote another answer, that turned out to be a bad idea:

[next(it) for _ in range(n)]

This crashes when it yields less than n values, and this behaviour depends on subtle issues, so people reading such code are unlikely to understand it's precise semantics.

What happens if next(it) was exhausted and raises StopIteration?

(i.e. when it had less than n values to yield)

When I wrote the above line a couple years ago, I probably thought a StopIteration will have the clever side effect of cleanly terminating the list comprehension. But no, the whole comprehension will crash passing the StopIteration upwards. (It'd exit cleanly only if the exception originated from the range(n) iterator.)

Which is probably not the behavior you want.

But it gets worse. The following is supposed to be equivalent to the list comprehension (especially on Python 3):

list(next(it) for _ in range(n))

It isn't. The inner part is shorthand for a generator function; list() knows it's done when it raises StopIteration anywhere.
=> This version copes safely when there aren't n values and returns a shorter list. (Like itertools.islice().)

[Executions on: 2.7, 3.4]

But that's too going to change! The fact a generator silently exits when any code inside it raises StopIteration is a known wart, addressed by PEP 479. From Python 3.7 (or 3.5 with a future import) that's going to cause a RuntimeError instead of cleanly finishing the generator. I.e. it'll become similar to the list comprehension's behaviour. (Tested on a recent HEAD build)

Neuron
  • 5,141
  • 5
  • 38
  • 59
Beni Cherniavsky-Paskin
  • 9,483
  • 2
  • 50
  • 58
  • 1
    Yes, also nice. I think the `islice` solution is a bit nicer, so I will accept that one. – Peter Smit Nov 11 '10 at 09:02
  • Of course this answer is much nicer, because it is simpler, needs no extra module to import, has less parentheses... Maybe in Python 4 slicing returns generators by default (compare to map in Py3). I'd only change `i` to `_`, to not have "unused variable" warnings in some IDEs ;). BTW, in Haskell it's called `take N`, which is a perfect function. – Tomasz Gandor Dec 14 '14 at 19:21
  • 1
    Except if n is larger then the generator's length you will get a StopIteration and a none defined variable. – xApple Aug 07 '15 at 17:28
  • @xApple oops, you're right! And it's confusingly different if written as list(genartor expr.). Edited to explain this, upvoted `islice`. – Beni Cherniavsky-Paskin Aug 10 '15 at 23:20
  • 2
    If you don't mind spurious values, you can use the default arg of the `next` function and call, for example `[next(it, None) for _ in range(n)]` – dafinguzman Oct 10 '19 at 18:41
4
for word, i in zip(word_reader(file), xrange(n)):
    ...
dan_waterworth
  • 6,261
  • 1
  • 30
  • 41
4

To get the first n values of a generator, you can use more_itertools.take.

If you plan to iterate over the words in chunks (eg. 100 at a time), you can use more_itertools.chunked (https://more-itertools.readthedocs.io/en/latest/api.html):

import more_itertools
for words in more_itertools.chunked(reader, n=100):
    # process 100 words
JustAC0der
  • 2,871
  • 3
  • 32
  • 35
  • 5
    I looked at the source code of `take` in more_itertools and to me it seems that the definition of `take` is just `list(islice(iterable, n))`. This, if you don't want to install a separate package for this, there should be no disadvantage to using the `islice` solution. – jochen Feb 05 '19 at 22:21
0

Use cytoolz.take.

>>> from cytoolz import take
>>> list(take(2, [10, 20, 30, 40, 50]))
[10, 20]
W.P. McNeill
  • 16,336
  • 12
  • 75
  • 111