I know it's been a long time since this question was asked, but I still feel like answering it my own way for future viewers and future reference. I'm not exactly sure if this is the best way to do it, but it can read multiple files simultaneously which is pretty cool.
from itertools import islice, chain
from pprint import pprint
def simread(files, nlines_segments, nlines_contents):
lines = [[] for i in range(len(files))]
total_lines = sum(nlines_contents)
current_index = 0
while len(tuple(chain(*lines))) < total_lines:
if len(lines[current_index]) < nlines_contents[current_index]:
lines[current_index].extend(islice(
files[current_index],
nlines_segments[current_index],
))
current_index += 1
if current_index == len(files):
current_index = 0
return lines
with open('A.txt') as A, open('B.txt') as B:
lines = simread(
[A, B], # files
[10, 10], # lines to read at a time from each file
[100, 100], # number of lines in each file
) # returns two lists containing the lines in files A and B
pprint(lines)
You can even add another file C (with any number of lines, even a thousand) like so:
with open('A.txt') as A, open('B.txt') as B, open('C.txt') as C:
lines = simread(
[A, B, C], # files
[10, 10, 100], # lines to read at a time from each file
[100, 100, 1000], # number of lines in each file
) # returns two lists containing the lines in files A and B
pprint(lines)
The values in nlines_segments
can also be changed, like so:
with open('A.txt') as A, open('B.txt') as B, open('C.txt') as C:
lines = simread(
[A, B, C], # files
[5, 20, 125], # lines to read at a time from each file
[100, 100, 1000], # number of lines in each file
) # returns two lists containing the lines in files A and B
pprint(lines)
This would read file A five lines at a time, file B twenty lines at a time, and file C 125 lines at a time.
NOTE: The values provided in nlines_segments
all have to be factors of their corresponding values in nlines_contents
, which should all be the exact number of lines in the files they correspond to.
I hope this heps!