1

I want to generate a bunch of files based on a template. The template has thousands of lines. For each of the new files, only top 5 lines are different. What is the best way of reading all the lines but first 5 at once instead of read the whole file in line by line?

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
ddd
  • 4,665
  • 14
  • 69
  • 125

2 Answers2

3

One approach would be to create a list of the 5 first lines, and read the rest in a big buffer:

with open("input.txt") as f:
    first_lines = [f.readline() for _ in range(5)]
    rest_of_lines = f.read()

or more symmetrical for the first part: create 1 small buffer with the 5 lines:

first_lines = "".join([f.readline() for _ in range(5)])

As an alternative, from a purely I/O point of view, the quickest would be

with open("input.txt") as f:
    lines = f.read()

and use a line split generator to read the 5 first lines (splitlines() would be disastrous in terms of memory copy, find an implementation here)

Community
  • 1
  • 1
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • Does this actually do much overall? From what I have pieced together from my reading, so I could be certainly be wrong, `read()` buffers lines anyway so it might be quicker just to `read` the file in one go for the sake of 5 lines? – roganjosh Feb 06 '17 at 21:42
  • 1
    maybe the I/O would be quicker, but after that you'd have to split the contents into first 5 lines and the rest: would double up the memory required. – Jean-François Fabre Feb 06 '17 at 21:43
  • Interesting, hadn't thought about it like that. – roganjosh Feb 06 '17 at 21:45
1

File objects in python are quite conveniently their own iterator objects so that when you call for line in f: ... you get the file line by line. The file object has what's generally referred to as a cursor that keeps track of where you're reading from. when you use the generic for loop, this cursor advances to the next newline each time and returns what it has read. If you interrupt this loop before the end of the file, you can pick back up where you left off with another loop or just a call to f.read() to read the rest of the file

with open(inputfile, 'r') as f:
    lineN = 0
    header = ""
    for line in f:
        header = header + line
        lineN += 1
        if lineN >= 4: #read first 5 lines (0 indexed)
            break
    body = f.read() #read the rest of the file
Aaron
  • 10,133
  • 1
  • 24
  • 40