How to read lots of line from file at once

Question

I want to generate a bunch of files based on a template. The template has thousands of lines. For each of the new files, only top 5 lines are different. What is the best way of reading all the lines but first 5 at once instead of read the whole file in line by line?

you mean: you want to read the 5 first lines one by one, and then the rest? — Jean-François Fabre, Feb 06 '17 at 21:25
that's because 500 hundreds lines is not very idiomatic in english (in french it isn't either). I'll edit the question. — Jean-François Fabre, Feb 06 '17 at 21:32

score 3 · Accepted Answer · edited May 23 '17 at 10:29

3

One approach would be to create a list of the 5 first lines, and read the rest in a big buffer:

with open("input.txt") as f:
    first_lines = [f.readline() for _ in range(5)]
    rest_of_lines = f.read()

or more symmetrical for the first part: create 1 small buffer with the 5 lines:

first_lines = "".join([f.readline() for _ in range(5)])

As an alternative, from a purely I/O point of view, the quickest would be

with open("input.txt") as f:
    lines = f.read()

and use a line split generator to read the 5 first lines (splitlines() would be disastrous in terms of memory copy, find an implementation here)

edited May 23 '17 at 10:29

Community

1
1

answered Feb 06 '17 at 21:28

Jean-François Fabre

137,073
23
153
219

Does this actually do much overall? From what I have pieced together from my reading, so I could be certainly be wrong, `read()` buffers lines anyway so it might be quicker just to `read` the file in one go for the sake of 5 lines? – roganjosh Feb 06 '17 at 21:42
1

maybe the I/O would be quicker, but after that you'd have to split the contents into first 5 lines and the rest: would double up the memory required. – Jean-François Fabre Feb 06 '17 at 21:43
Interesting, hadn't thought about it like that. – roganjosh Feb 06 '17 at 21:45

score 1 · Answer 2 · answered Feb 06 '17 at 21:39

File objects in python are quite conveniently their own iterator objects so that when you call for line in f: ... you get the file line by line. The file object has what's generally referred to as a cursor that keeps track of where you're reading from. when you use the generic for loop, this cursor advances to the next newline each time and returns what it has read. If you interrupt this loop before the end of the file, you can pick back up where you left off with another loop or just a call to f.read() to read the rest of the file

with open(inputfile, 'r') as f:
    lineN = 0
    header = ""
    for line in f:
        header = header + line
        lineN += 1
        if lineN >= 4: #read first 5 lines (0 indexed)
            break
    body = f.read() #read the rest of the file

How to read lots of line from file at once

2 Answers2