I'm currently producing a python program to search through .txt files provided and remove any numbers, commas, and some certain words. It's for use in transcribing phone calls, so these are words like "um" and "uh" which are unnecessary. This is written back into a new text file which contains everything except the removed data.
The code I have produced works, but also removes those words from any longer words containing them, for example "momentum" becomes "moment" as it contains "um". Here is the code:
infile = "testfile.txt"
outfile = "cleanedfile.txt"
numbers = [1,2,3,4,5,6,7,8,9]
deleteList = [",", "Um", "um", "Uh", "uh", str(numbers)]
fin = open(infile)
fout = open(outfile, 'w+')
for line in fin:
for word in deleteList:
line = line.replace(word, "")
fout.write(line)
fin.close()
fout.close()
Any help would be greatly appreciated.