I am currently using a C++ script with a Python wrapper for manipulating a larger (15 GB) text file line-by-line. Effectively what it does is it reads a line from input.txt, processes it, the outputs the result to output.txt. I am using the straigtforward loop here (inp being opened as input.txt, out being opened as output.txt):
for line in inp:
result = operate(line)
out.write(result)
However, because of the C++ script's issues, it has some failure rate, which causes the loop to shut after about ten million iterations. This leaves me with an output file made using only like 10% of the input.
Since I have no means of fixing the original script, I thought about just restarting it where it stopped. I counted the lines of output.txt, made another called output2.txt, and started the following code:
k = 0
for line in inp:
if k < 12123253:
k + = 1
else:
result = operate(line)
out2.write(result)
k + = 1
However, compared to when I was counting the lines, which ended under a minute, this method takes long hours to get to the designated line.
Why is this method inefficient? Is there a faster one? I am on a Windows pc with a strong calculating capability (72GB RAM, good processors), and using python 2.7.