I hope I am clear. I am trying to create a Python program that goes through the first file and determine what words are repeated. In order to determine if the words are repeated, the contents of the file must be stripped from punctuation and must be in lower case. After this is done, the program then writes the words that are repeated unto the second text file. The repeated words are to be written only once in the second file.
Below, I've made an attempt and I ran into two errors.
Error one: I've noticed that the punctuation strip function that was created does not fully remove all the punctuation.
Error two: The repeated words are written to the second folder as many times as they appear in the original. I attempted to use a break function if the word had already existed but it somehow bypasses the break function.
Below is my code.
def repeatWords(filename_1, filename_2):
infile_1=open(filename_1,'r')
content_1=infile_1.read()
infile_1.close()
import string
content_1=content_1.strip(string.punctuation) # this did not remove all punctuations
content_1=content_1.lower()
content_1=content_1.split()
outfile=open(filename_2,'w')
outfile.write('') #used to create second file, assuming it does not exist
outfile.close()
outfile=open(filename_2,'r+')
write_content=outfile.read()
for word in content_1:
write_content=outfile.read()
if content_1.count(word)>1:
if word in write_content:
break
else:
outfile.write(word)
outfile.write('\n')
outfile.close()
# after this is executed, the words repeat as many times as they appear
infile_2=open(filename_2,'r')
content_2=infile_2.read()
infile_2.close()
return content_2
inF = 'catInTheHat.txt'
outF = 'catRepWords.txt'
print(repeatWords(inF, outF))
The contents in the first file is:
Too wet to go out and too cold to play ball.
So we sat in the house.
We did nothing at all.
So all we could do was to Sit! Sit! Sit! Sit!
Screenshot link --> http://oi59.tinypic.com/hrln3r.jpg