0

I would like to remove unknown words and characters from the sentence. The text is the output of the transformers model program. So, Sometimes it produces unknown repeated words. I have to remove those words in order to make the sentence readable.

Input

text = "This is an example sentence 098-1832-1133 and this is another sentence.WAA-FAHHaAA. This is the third sentence WA WA WA aZZ aAD"

Expected Output

text = "This is an example sentence and this is another sentence. This is the third sentence"
Procodedev
  • 56
  • 5
  • Here is a similar question: https://stackoverflow.com/questions/41290028/removing-non-english-words-from-text-using-python – dnbwise Feb 25 '22 at 15:04
  • I checked the answers. None of them are useful. – Procodedev Feb 25 '22 at 23:17
  • If you look at the accepted answer from the other question, most of the non-words would be removed. Your example would end up as: 'This is an example sentence and this is another sentence This is the third sentence WA WA WA'. Notice WA is left over, since it is an abbreviated state name; for such instances, you can then put some ad hoc rules in place. – dnbwise Feb 26 '22 at 00:24

0 Answers0