I have a large text file that I'd like to turn into a list of words. I've been able to get as far as getting a list for each line in the file, but I want a single list.
Here's what I have.
import unicodedata
import codecs
infile = codecs.open('FILE.txt', 'r', encoding ='ascii', errors = 'ignore')
outfile = codecs.open('FILE2.txt', 'w', encoding ='ascii', errors = 'ignore')
for word in infile:
mylist = str(word.split())
outfile.write(mylist)
infile.close()
outfile.close()
This gives me an outfile that looks like:
[word, word][word, word, word, word][word, word]...[word,word]
I am hoping to get an outfile that looks like:
[word, word, word, .... word, word, word]
I know how to concatenate multiple lists, but these lists are immediately written to my outfile. As written, my code would not allow for me to concatenate the lists after the fact.
UPDATE:
Thank you for all of your help. I have solved the problem with the following:
import unicodedata
import codecs
infile = codecs.open('FILE1.txt', 'r', encoding ='ascii', errors = 'ignore')
outfile = codecs.open('FILE2.txt', 'w', encoding ='ascii', errors = 'ignore')
mylist =[]
for line in infile:
for word in line.split():
mylist.append(word)
outfile.write(str(mylist))
infile.close()
outfile.close()