-3

How would I remove all Unicode from this string【Hello!】★ ああああ I need to remove all the "weird" symbols (【, ★, 】) and keep "Hello!" and "ああああ". This needs to work for all languages not just Japanese.

2 Answers2

1

You want to remove characters within the Unicode categories Other Symbol, Combining Symbol, and Enclosing Mark, but leave those from other categories.

Using regular expressions, those match the classes \p{So}, \p{Sk} and \p{Me}, respectively. You might for example use XRegExp.replace().

Davislor
  • 14,674
  • 2
  • 34
  • 49
  • There is a regular expression with PHP which looks like `\p{common}`, this would work, *but* this is PHP, I need JavaScript. Same goes with yours. – BurstingKitten Sep 30 '18 at 03:41
  • There are [regex libraries for JavaScript that support categories.](https://regular-expressions.mobi/xregexp.html) – Davislor Sep 30 '18 at 03:48
-1

I have found a solution. Using XRegEXP, I was able to use PHP's \p{Common} in node.

const xreg = require('xregexp');

let str = '【Hello!】★ ああああ】';
let regex = new xreg('\\p{Common}', 'g');
let res = xreg.replace(str, regex, ' ');

console.log(res); // Hello    ああああ 
  • Side note: PHP borrowed that syntax from PCRE, which stands for Perl-Compatible Regular Expressions. (Its syntax is not really Perl-compatible.) – Davislor Sep 30 '18 at 09:21