How to read file N lines at a time?

Question

I need to read a big file by reading at most N lines at a time, until EOF. What is the most effective way of doing it in Python? Something like:

with open(filename, 'r') as infile:
    while not EOF:
        lines = [get next N lines]
        process(lines)

Quick very silly question: Will whatever you are going to do inside `process(lines)` work if N == 1? If not, you have a problem with a potential single line in the last bunch. If it does work with N == 1, then it would be much more efficient just to do `for line in infile: work_on(line)`. — John Machin, May 01 '11 at 00:18
@JohnMachin While it may work for N == 1, it may not be efficient. Think mini batch gradient descent in DL. — max_max_mir, May 26 '21 at 00:36

score 48 · Accepted Answer · edited May 01 '11 at 02:31

48

One solution would be a list comprehension and the slice operator:

with open(filename, 'r') as infile:
    lines = [line for line in infile][:N]

After this lines is tuple of lines. However, this would load the complete file into memory. If you don't want this (i.e. if the file could be really large) there is another solution using a generator expression and islice from the itertools package:

from itertools import islice
with open(filename, 'r') as infile:
    lines_gen = islice(infile, N)

lines_gen is a generator object, that gives you each line of the file and can be used in a loop like this:

for line in lines_gen:
    print line

Both solutions give you up to N lines (or fewer, if the file doesn't have that much).

edited May 01 '11 at 02:31

tzot

92,761
29
141
204

answered Apr 29 '11 at 13:55

Martin Thurau

7,564
7
43
80

3

Simplified to `lines = islice(infile, N)` – madprogrammer Apr 29 '11 at 14:15
9

Note: it reads N lines and stops. To read the next N lines, you could wrap your code in a loop (until EOF) or use the grouper recipe as shown in my answer. – jfs Dec 02 '16 at 12:48
41

This solution doesn't answer the question of "how do I read N lines at a time until EOF". It only goes so far as to provide the mechanism for reading N lines at a time, but then only demonstrates reading N lines one at a time (the for loop at the end). – kfsone Dec 13 '16 at 18:53
14

The OP states **I need to read a big file by reading at most N lines at a time**, and your first solution loads all lines into memory?! Maybe you should not even consider that first solution and remove it from your answer!!! – nbro Sep 27 '19 at 15:20
1

This answer demonstrates something useful—but not a solution to the original question as asked. – Daniel Standage Apr 06 '22 at 19:56

score 23 · Answer 2 · edited Jun 18 '21 at 20:02

23

A file object is an iterator over lines in Python. To iterate over the file N lines at a time, you could use grouper() function in the Itertools Recipes section of the documenation. (Also see What is the most “pythonic” way to iterate over a list in chunks?):

try:
   from itertools import izip_longest
except ImportError:  # Python 3
    from itertools import zip_longest as izip_longest

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return izip_longest(*args, fillvalue=fillvalue)

Example

with open(filename) as f:
     for lines in grouper(f, N, ''):
         assert len(lines) == N
         # process N lines here

edited Jun 18 '21 at 20:02

martineau

119,623
25
170
301

answered Apr 30 '11 at 22:49

jfs

399,953
195
994
1,670

2

@Kevin J. Chase: 1- binary file is an iterator over `b'\n'`-lines 2- `itertools.izip_longest` is not removed in Python 3, it is renamed to `itertools.zip_longest` – jfs May 27 '16 at 06:51
I mostly wanted to update that link, since the code only works _as written_ in Python 2, and unspecified links to docs.python.org seem to default to 3 instead of 2 now. 1: True enough. 2: It's debatable which of the `zip` / `izip` functions got "removed" in Python 3 --- the code for one is missing, the name for the other is. – Kevin J. Chase May 27 '16 at 07:36
3

I don't mind the edit. The comment is for your benefit. [`itertools.zip_longest()`](https://github.com/python/cpython/blob/2686f50b3bfd921dfc4f6eb0b663be56a73357ae/Modules/itertoolsmodule.c#L4272) in Python 3 and [`itertools.izip_longest()`](https://github.com/python/cpython/blob/c8f718080e60aa93842e20ee95dd90b29cb570b9/Modules/itertoolsmodule.c#L3828) in Python 2 are the same object. – jfs May 27 '16 at 07:44
@martineau: why did you remove the python2 shebang? `izip_longest` is not available in Python 3 (it is renamed there to `zip_longest`) – jfs Jun 18 '21 at 19:36

score 16 · Answer 3 · edited Oct 19 '17 at 03:55

This code will work with any count of lines in file and any N. If you have 1100 lines in file and N = 200, you will get 5 times to process chunks of 200 lines and one time with 100 lines.

with open(filename, 'r') as infile:
    lines = []
    for line in infile:
        lines.append(line)
        if len(lines) >= N:
            process(lines)
            lines = []
    if len(lines) > 0:
        process(lines)

duhaime · Answer 4 · 2020-11-19T13:02:41.730

2

I needed to read in n lines at a time from files for extremely large files (~1TB) and wrote a simple package to do this. If you pip install bigread, you can do:

from bigread import Reader

stream = Reader(file='large.txt', block_size=10) 
for i in stream:
  print(i)

block_size is the number of lines to read at a time.

This package is no longer maintained. I now find it best to use:

with open('big.txt') as f:
  for line_idx, line in enumerate(f):
    print(line)

If you need a memory of previous lines, just store them in a list. If you need to know future lines to decide what to do with the current line, store the current line in a list until you get to that future line...

edited Nov 19 '20 at 13:02

answered Jun 28 '18 at 12:39

duhaime

25,611
17
169
224

the link given above seems broken, also I could not match it to any of your other repos at github. there is a version available on https://pypi.org/project/bigread but it looks no longer maintained? – antiplex Nov 19 '20 at 12:57
Yes it's no longer maintained :/ I updated the answer above to show how I approach this problem now; I hope this helps! – duhaime Nov 19 '20 at 13:03

score 2 · Answer 5 · answered Apr 29 '11 at 13:50

2

maybe:

for x in range(N):
  lines.append(f.readline())

answered Apr 29 '11 at 13:50

yurib

8,043
3
30
55

score 2 · Answer 6 · edited May 23 '17 at 12:25

2

I think you should be using chunks instead of specifying the number of lines to read. It makes your code more robust and generic. Even if the lines are big, using chunk will upload only the assigned amount of data into memory.

Refer to this link

edited May 23 '17 at 12:25

Community

1
1

answered Apr 29 '11 at 13:54

Konstant

2,179
16
32

score 1 · Answer 7 · answered Apr 29 '11 at 13:50

1

How about a for loop?

with open(filename, 'r') as infile:
    while not EOF:
        lines = []
        for i in range(next N lines):
            lines.append(infile.readline())
        process(lines)

answered Apr 29 '11 at 13:50

Spencer Rathbun

14,510
6
54
73

2

what is this syntax "next N lines", pseudocode? python noob here – Colin D May 26 '17 at 17:51
@ColinD it's just the number of lines you want. For instance 7 lines would be `for i in range(7)` – Spencer Rathbun Jun 01 '17 at 18:30

quamrana · Answer 8 · 2011-04-30T09:16:31.173

1

You may have to do something as simple as:

lines = [infile.readline() for _ in range(N)]

Update after comments:

lines = [line for line in [infile.readline() for _ in range(N)] if len(line) ]

edited Apr 30 '11 at 09:16

answered Apr 29 '11 at 13:50

quamrana

37,849
12
53
71

Your code have no checking on line count. For example if line couns is smaller than N - you will get error. – Anatolij Apr 29 '11 at 13:53
@Anatolij: You're right that there is no checking - but you just get empty strings after EOF and no error. – quamrana Apr 29 '11 at 13:55
You will need to check each item in `process()`, so this is overhead. – Anatolij Apr 29 '11 at 13:57

Q. Qiao · Answer 9 · 2022-02-15T19:15:41.933

def get_lines_iterator(filename, n=10):
    with open(filename) as fp:
        lines = []
        for i, line in enumerate(fp):
            if i % n == 0 and i != 0:
                yield lines 
                lines = []
            lines.append(line)
    if lines:
        yield lines 

for lines in b():
    print(lines)

It is simpler with islice:

from itertools import islice

def get_lines_iterator(filename, n=10):
    with open(filename) as fp:
        while True:
            lines = list(islice(fp, n))
            if lines:
                yield lines
            else:
                break

for lines in get_lines_iterator(filename):
    print(lines)

Another way to do this:

from itertools import islice

def get_lines_iterator(filename, n=10):
    with open(filename) as fp:
        for line in fp:
            yield [line] + list(islice(fp, n-1))
           

for lines in get_lines_iterator(filename):
    print(lines)

score 0 · Answer 10 · edited Nov 01 '17 at 22:38

0

If you can read the full file in ahead of time;

infile = open(filename, 'r').readlines()
my_block = [line.strip() for line in infile[:N]]
cur_pos = 0
while my_block:
    print (my_block)
    cur_pos +=1
    my_block = [line.strip() for line in infile[cur_pos*N:(cur_pos +1)*N]]

edited Nov 01 '17 at 22:38

0TTT0

1,288
1
13
23

answered Nov 01 '17 at 21:51

ChrisEisenhart

41
4

Haromn · Answer 11 · 2018-08-22T09:13:24.243

0

I was looking for an answer to the same question, but did not really like any of the proposed stuff earlier, so I ended up writing this slightly ugly thing that does exactly what I wanted ~~without using strange libraries~~.

def test(filename, N):
    with open(filename, 'r') as infile:
        lines = []
        for line in infile:
            line = line.strip()
            if len(lines) < N-1:
                lines.append(line)
            else:
                lines.append(line)
                res = lines
                lines = []
            yield res
        else:
            if len(lines) != 0:
                yield lines

edited Aug 22 '18 at 09:13

answered Aug 21 '18 at 07:22

Haromn

1
3

2

itertools is in Python standard library – madprogrammer Aug 22 '18 at 08:07
fair enough, itertools is fine, I did not feel comfortable about islice. – Haromn Aug 22 '18 at 09:17

How to read file N lines at a time?

11 Answers11

Example

Linked

Related