So I have a problem. I am working with .txt files which are comprised of multiple of 4 lines. I am working in python 3.
I wrote a code that is meant to take every 2nd and 4th line of a text file and keep only the first 20 characters of those two lines (while leaving the 1st and 3rd line unedited), and create a new edited file comprising of the the edited 2nd and 4th line and the unedited 1st and 3rd line. This trend would be the same for every line since all the text files I work with have line numbers that are always multiple of 4.
This works on small files (~100 lines total) but the files I need edition are 50 million+ lines and it is taking 4+ hours.
Below is my code. Can anyone give me a suggestion on how to speed up my program? Thanks!
import io
import os
import sys
newData = ""
i=0
run=0
j=0
k=1
m=2
n=3
seqFile = open('temp100.txt', 'r')
seqData = seqFile.readlines()
while i < 14371315:
sLine1 = seqData[j]
editLine2 = seqData[k]
sLine3 = seqData[m]
editLine4 = seqData[n]
tempLine1 = editLine2[0:20]
tempLine2 = editLine4[0:20]
newLine1 = editLine2.replace(editLine2, tempLine1)
newLine2 = editLine4.replace(editLine4, tempLine2)
newData = newData + sLine1 + newLine1 + '\n' + sLine3 + newLine2
if len(seqData[k]) > 20:
newData += '\n'
i=i+1
run=run+1
j=j+4
k=k+4
m=m+4
n=n+4
print(run)
seqFile.close()
new = open("new_100temp.txt", "w")
sys.stdout = new
print(newData)