I want to generate a bunch of files based on a template. The template has thousands of lines. For each of the new files, only top 5 lines are different. What is the best way of reading all the lines but first 5 at once instead of read the whole file in line by line?
-
you mean: you want to read the 5 first lines one by one, and then the rest? – Jean-François Fabre Feb 06 '17 at 21:25
-
@Jean-FrançoisFabre yes – ddd Feb 06 '17 at 21:27
-
I misunderstood the question sorry – roganjosh Feb 06 '17 at 21:29
-
1that's because 500 hundreds lines is not very idiomatic in english (in french it isn't either). I'll edit the question. – Jean-François Fabre Feb 06 '17 at 21:32
2 Answers
One approach would be to create a list of the 5 first lines, and read the rest in a big buffer:
with open("input.txt") as f:
first_lines = [f.readline() for _ in range(5)]
rest_of_lines = f.read()
or more symmetrical for the first part: create 1 small buffer with the 5 lines:
first_lines = "".join([f.readline() for _ in range(5)])
As an alternative, from a purely I/O point of view, the quickest would be
with open("input.txt") as f:
lines = f.read()
and use a line split generator to read the 5 first lines (splitlines()
would be disastrous in terms of memory copy, find an implementation here)

- 1
- 1

- 137,073
- 23
- 153
- 219
-
Does this actually do much overall? From what I have pieced together from my reading, so I could be certainly be wrong, `read()` buffers lines anyway so it might be quicker just to `read` the file in one go for the sake of 5 lines? – roganjosh Feb 06 '17 at 21:42
-
1maybe the I/O would be quicker, but after that you'd have to split the contents into first 5 lines and the rest: would double up the memory required. – Jean-François Fabre Feb 06 '17 at 21:43
-
File objects in python are quite conveniently their own iterator objects so that when you call for line in f: ...
you get the file line by line. The file object has what's generally referred to as a cursor that keeps track of where you're reading from. when you use the generic for
loop, this cursor advances to the next newline each time and returns what it has read. If you interrupt this loop before the end of the file, you can pick back up where you left off with another loop or just a call to f.read()
to read the rest of the file
with open(inputfile, 'r') as f:
lineN = 0
header = ""
for line in f:
header = header + line
lineN += 1
if lineN >= 4: #read first 5 lines (0 indexed)
break
body = f.read() #read the rest of the file

- 10,133
- 1
- 24
- 40