How would I remove all Unicode from this string【Hello!】★ ああああ I need to remove all the "weird" symbols (【, ★, 】) and keep "Hello!" and "ああああ". This needs to work for all languages not just Japanese.
Asked
Active
Viewed 212 times
-3
-
are weird symbols `【` , `★`, and `】` in this case? do we need to consider other symbols? – kuromoka Sep 30 '18 at 03:20
-
Yes, other symbols as well. – BurstingKitten Sep 30 '18 at 03:23
2 Answers
1
You want to remove characters within the Unicode categories Other Symbol, Combining Symbol, and Enclosing Mark, but leave those from other categories.
Using regular expressions, those match the classes \p{So}
, \p{Sk}
and \p{Me}
, respectively. You might for example use XRegExp.replace()
.

Davislor
- 14,674
- 2
- 34
- 49
-
There is a regular expression with PHP which looks like `\p{common}`, this would work, *but* this is PHP, I need JavaScript. Same goes with yours. – BurstingKitten Sep 30 '18 at 03:41
-
There are [regex libraries for JavaScript that support categories.](https://regular-expressions.mobi/xregexp.html) – Davislor Sep 30 '18 at 03:48
-1
I have found a solution. Using XRegEXP, I was able to use PHP's \p{Common}
in node.
const xreg = require('xregexp');
let str = '【Hello!】★ ああああ】';
let regex = new xreg('\\p{Common}', 'g');
let res = xreg.replace(str, regex, ' ');
console.log(res); // Hello ああああ

BurstingKitten
- 13
- 2
-
Side note: PHP borrowed that syntax from PCRE, which stands for Perl-Compatible Regular Expressions. (Its syntax is not really Perl-compatible.) – Davislor Sep 30 '18 at 09:21