I have a string which contains both English and Arabic and now I need to remove special characters.
I know there exist a regex solution:
re.sub('[^A-Za-z0-9]+', '', mystring)
but this regex is also removing Arabic letters from the string.
I have a string which contains both English and Arabic and now I need to remove special characters.
I know there exist a regex solution:
re.sub('[^A-Za-z0-9]+', '', mystring)
but this regex is also removing Arabic letters from the string.
If underline (_) is not among your special characters, one clean way around this is using word characters modifier along with a unicode flag (In python-3 strings are unicodes and you don't need unicode flag).
In [10]: s = "#$&%NKGS&$@023489_7نسیتلبskdjfh3%-"
In [11]: re.sub('[^\w]+', '', s, flags=re.U)
Out[11]: 'NKGS023489_7نسیتلبskdjfh3'
If it's not you can also include that like following:
In [12]: re.sub('[^\w]+|_', '', s, flags=re.U)
Out[12]: 'NKGS0234897نسیتلبskdjfh3'