I'm trying to make class that can clean text. The class has several methods, like converting text to lower case, spell checking the text, lemmatizing the text, removing special characters etc. Finally I have a method (cleaned_text
) that calls all the above methods in order and returns the final cleaned text. Here is the code:
class TextCleaner:
def __init__(self, string):
self.string = string
def lowercase(self):
string_lower = self.string.lower()
return string_lower
def regex_stripper(self):
stripped = re.sub(r"[^a-zA-Z0-9 ']+", " ", self.string)
no_double_spaces = re.sub(r' +', ' ', stripped)
return no_double_spaces
def spell_checker(self):
spell_checked = sym_spell.lookup_compound(self.string, max_edit_distance=2)[0].term
return spell_checked
def remove_stop(self):
no_stop_words = " ".join(i for i in self.string.split() if i not in stop)
return no_stop_words
def lemmatize(self):
doc = nlp(self.string)
lemmatized_sentence = " ".join([token.lemma_ for token in doc])
return lemmatized_sentence
@lowercase
@regex_stripper
@spell_checker
@remove_stop
@lemmatize
def cleaned_text(self):
return self.string
I'm not very well versed with decorators so sorry for the clumsy code. The cleaned_text
method ought to do the following methods in order - lowercase
, regex_stripper
, remove_stop
, lemmatize
and then finally return the cleaned text.