0

I'm looking for a way to read from two large files simultaneously without bringing the whole data into memory. I want to parse M lines from the first file with N lines from the second file. Is there any wise and memory efficient solution for it?

So far I know how to do it with reading two files at the same time line by line. But I don't know if it would be possible to extend this code to read for example 4 lines from the first file, and 1 line from the second file.

from itertools import izip
with open("textfile1") as textfile1, open("textfile2") as textfile2: 
for x, y in izip(textfile1, textfile2):
    x = x.strip()
    y = y.strip()
    print("{0}\t{1}".format(x, y))

from here, Read two textfile line by line simultaneously -python

Community
  • 1
  • 1
user2373198
  • 147
  • 10

2 Answers2

0

Just open the files and use e.g. line = textfile1.readline() to read a line from one of the files.

line will contain a trailing newline. You see that you reached the end, when an empty string is returned.

fafl
  • 7,222
  • 3
  • 27
  • 50
0

That would read the next n lines from file1 then next m lines from file2 within some other code

def nextlines(number, file):
    n_items = []
    index = number
    while(index > 0):
        try:
            n_items += [next(file).strip()]
            index -= 1
        except StopIteration:
            break
    return n_items

n = 5
m = 7
l1 = []
l2 = []
with open('file.dat', 'r') as file1:
    with open('file.dat', 'r') as file2:
        #some code
        l1 = nextlines(n, file1)
        l2 = nextlines(m, file2)
        #some other code
        file2.close()
    file1.close()
print l1
print l2