Since the input file can be extremely large, my concern is memory efficiency. So I am using a two-pass algorithm. In the first pass I count the number of lines being memory efficient and compute how many lines should be apportioned to each of the output files. In the second pass, I reread the input file one line at a time writing the output to each out[put file in turn for the required number of lines:
def split(infile, outfiles, n_files=8):
"""Splits a file into n_files pieces
:param infile name of the input files
:type str
:param outfiles list of output file names of length n_files
:type list
:param n_files number of equal pieces the input file should be split into
:type int
"""
# get the total number of lines:
with open(infile, 'r') as f:
num_lines = sum(1 for line in f)
lines_per_file = num_lines // n_files;
if lines_per_file == 0 and num_lines > 0:
lines_per_file = 1
# compute number of lines to go into each file:
count = []
for i in range(n_files - 1):
lines_per_file = min(lines_per_file, num_lines)
count.append(lines_per_file)
num_lines -= lines_per_file
# and for the last file anything that is left over:
count.append(num_lines)
with open(infile, 'r') as f1:
for i in range(n_files):
with open(outfiles[i], 'w') as f2:
for j in range(count[i]):
print(f1.readline(), end='', file=f2)