I am trying to write a script that would clean unnecessary characters from a data txt file. I was able to successfully run the script once but every other attempt gives the error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa2 in position 8149: invalid start byte
import codecs
import sys
if len(sys.argv) < 2:
startFile = "test.txt"
else:
startFile = sys.argv[1]
finishFile = "newtest.txt"
def cleanFile():
f = open(startFile, "r")
#f = codecs.open("GNMFDB.TXT", "r", "utf-8")
newFile = open(finishFile, "a")
for line in f:
line = line.replace("=", "")
newFile.write(line)
def clearNewFile():
newFile = open(finishFile, "w")
newFile.close()
if __name__ == "__main__":
#startFile = "test.txt"
#finishFile = "newtest.txt"
clearNewFile()
cleanFile()
I know the issue has to do with UTF-8 trying to be converted to strings or something along those lines. Copying some lines from the original .txt file and putting them in a seperate .txt file I created in vim does cause the script to run successfully every time. I know codecs could be used for a situation like this but when i tried it it gave me similar error (hence the line being commented out).