I have tried below piece of code to remove punctuation from a string.
import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)
This works fine for roman like text(script) but seems to have problem with Unicode like text like Hindi, Telugu etc.
for example:
import re
s = "అనేది దేనికి సమానం అవుతుంది."
s = re.sub(r'[^\w\s]','',s)
This one completely changes the text itself and making it not understandable by removing dependent vowels of that script.
So my question is how can I remove punctuation from text that is other than roman text.
The duplicate question linked will replace punctuation for roman like string as I already mentioned. My issue here is to replace punctuation for Unicode like string. There is a clear difference not a duplicate.