0

I want to remove miscellaneous symbols block from Unicode string using regular expression may also try but none of them regular expression I think current can anyone help me for this issue how I can remove miscellaneous symbols block from the string.

Unicode String

\u263A\uD83D\uDE0A\uD83D\uDE22)\uD83C\uDF82

Code:

String input = "\u263A\uD83D\uDE0A\uD83D\uDE22)\uD83C\uDF82";
input.replaceAll("[\u2600-\u26FF]|[\u2700-\u27BF]", "");

Expected:

\uD83D\uDE0A\uD83D\uDE22)\uD83C\uDF82

but it will be not working how I can solve this issue.

Kishan Donga
  • 2,851
  • 2
  • 23
  • 35
  • Tip: You can match Unicode blocks with `\p{InMiscellaneousSymbols}|\p{InDingbats}` or `[\p{InMiscellaneousSymbols}\p{InDingbats}]`. This is much more readable. – nwellnhof Aug 16 '17 at 11:23

2 Answers2

0

It not work because String is immutable in Java you have to use assign the result to the input like this :

String result = input.replaceAll("[\u2600-\u26FF]|[\u2700-\u27BF]", "");

Or simply :

input = input.replaceAll("[\u2600-\u26FF]|[\u2700-\u27BF]", "");

So if you make a print like this :

System.out.println(input);
System.out.println("\uD83D\uDE0A\uD83D\uDE22)\uD83C\uDF82");

Both gives :

)
)
Youcef LAIDANI
  • 55,661
  • 15
  • 90
  • 140
  • thank you YCF_L can you try please this one String input = "\\u263A\\uD83D\\uDE0A\\uD83D\\uDE22\\uD83C\\uDF82"; input = input.replaceAll("[\\u2600-\\u26FF]|[\\u2700-\\u27BF]", ""); System.out.println(input); Output: – Kishan Donga Aug 15 '17 at 10:44
  • \u263A\uD83D\uDE0A\uD83D\uDE22\uD83C\uDF82 – Kishan Donga Aug 15 '17 at 10:44
  • @KishanDonga the output is correct because you are using double back-slash instead you have to use only one with Unicode String like `String input = "\u263A\uD83D\uDE0A\uD83D\uDE22\uD83C\uDF82"; input = input.replaceAll("[\u2600-\u26FF]|[\u2700-\u27BF]", ""); System.out.println(input);` the output now is – Youcef LAIDANI Aug 15 '17 at 11:31
0

If the Input text contains u-escaped characters, as text consisting out of a backslash, 'u' and 4 hexadecimal Digits, convert them first to real chars.

input = StringEscapeUtils.unescapeJava(Input); // From Apache commons
input = input.replaceAll("[\u2600-\u26FF]|[\u2700-\u27BF]", "");
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138