My goal is to write a function that inputs a text and substitutes all characters except for latin alphabet (A-z) with whitespaces, plus it deletes all the words containing digits. Then it replaces all multiple whitespace with a single one.
Example:
' hello, world! ho1hoho2ho, merry xmas!! ho1ho1 :))' -> 'hello world merry xmas'.
The Python function that implements this:
def clean_text(text):
text_valid = re.sub(u'[^A-z0-9]', ' ', text)
return ' '.join(word for word in text_valid.split()
if not re.search(r'\d', word))
Now I wonder if there is a single regular expression for this, maybe, so I could just write something like
return ' '.join(re.findall(enter_my_magical_regex_here))
Or, maybe, there is another way to replace the code above with something faster (and, hopefully, shorter)?