Consider the following for spell-correction:
from autocorrect import spell
import re
WORD = re.compile(r'\w+')
def reTokenize(doc):
tokens = WORD.findall(doc)
return tokens
text = ["Hi, welcmoe to speling.","This is jsut an exapmle, but cosnider a veri big coprus."]
def spell_correct(text):
sptext = []
for doc in text:
sptext.append(' '.join([spell(w).lower() for w in reTokenize(doc)]))
return sptext
print(spell_correct(text))
Here is the output for above piece of code:
How I can stop displaying the output in jupyter notebook? Particularly if we have a large number of text documents, it will be lots of outputs.
My second question is: how can I improve the speed and accuracy (please check the word "veri" in the output for example) of the code when applying on a large data? Is there any better way to do this? I appreciate your response and (alternative) solutions with better speed.