0

I want to remove all punctuation symbol from my list, which is unicode text. When I executed the code as shown, there was no effect on the list. No symbol has been removed from the list. Though there was no syntax error displayed. I would like to mention that text is in the Gujarati language. I successfully removed the digits from the list.

....
....
filtered_words = [word for word in words if not re.search(r"[\P]+",word)]
....
Uttam
  • 9
  • 2

1 Answers1

0

You don't need to use regex here.

You can use string.translate like this:

filtered_words == [word for word in words if word == word.translate(None, string.punctuation)]

If you must use regex then use:

filtered_words == [word for word in words if not re.search(ur'[^\w\s]', word)]

Check this Q&A or Unicode Punctuation detection

Community
  • 1
  • 1
anubhava
  • 761,203
  • 64
  • 569
  • 643