How do I convert this string:
"\xa0かかわらず"
to this string?:
"かかわらず"
i.e. How do I remove non-alphanumeric unicode characters? I've tried the solution that encodes the string as ascii, but it doesn't work for Japanese symbols.
How do I convert this string:
"\xa0かかわらず"
to this string?:
"かかわらず"
i.e. How do I remove non-alphanumeric unicode characters? I've tried the solution that encodes the string as ascii, but it doesn't work for Japanese symbols.
Using re.sub to replace the \W (non-word) pattern with an empty string should work, e.g.
re.sub(r'\W', '', "\x0aかか\x0aわらず")
– metatoaster Jul 18 '19 at 2:59
This works. Since metatoaster only wrote it as a comment and not everybody reads them, I felt free to write this as an actual answer...