How do I strip punctuation from ASCII and UTF-8 encoded strings without messing up the UTF-8 original characters, specifically Chinese, in R.
text <- "Longchamp Le Pliage 肩背包 (小)"
stri_replace_all_regex(text, '\\p{P}', '')
results in:
Longchamp Le Pliage ��背�� 小
but the desired result should be:
Longchamp Le Pliage 肩背包 小
I'm looking to remove all the CJK Symbols and Punctuation as well ask ASCII punctuations.
@akrun, sessionInfo() is as follows
locale:
[1] LC_COLLATE=English_Singapore.1252 LC_CTYPE=English_Singapore.1252 LC_MONETARY=English_Singapore.1252
[4] LC_NUMERIC=C LC_TIME=English_Singapore.1252