I have some pretty large text files (>2g) that I would like to process word by word. The files are space-delimited text files with no line breaks (all words are in a single line). I want to take each word, test if it is a dictionary word (using enchant), and if so, write it to a new file.
This is my code right now:
with open('big_file_of_words', 'r') as in_file:
with open('output_file', 'w') as out_file:
words = in_file.read().split(' ')
for word in word:
if d.check(word) == True:
out_file.write("%s " % word)
I looked at lazy method for reading big file in python, which suggests using yield
to read in chunks, but I am concerned that using chunks of predetermined size will split words in the middle. Basically, I want chunks to be as close to the specified size while splitting only on spaces. Any suggestions?