2

I would like to change two concatenated characters : \uD800\uDC00 into an x but strangely it gives a weird character instead, could someone show me what is wrong in here ? When I run the following code :

System.out.println("\uD800\uDC00".replaceAll("([\uD800-\uDBFF]&&['\uDC00'-'\uDFFF'])", "x"));

I get this character as an ouput

Bionix1441
  • 2,135
  • 1
  • 30
  • 65

2 Answers2

2

First off, there are syntax errors in the regex. The "&&" in the pattern should not be there. Likewise, the single quotes and parentheses should not be there.

The syntax corrections above are required, but not sufficient. \uD800 is a "magic" character. It combines with the next character to form a single 4-byte Unicode code point: https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Surrogates

The regex is interpreted using Unicode code points, not Java characters. \uD800\uDC00 is a single Unicode code point (0x10000), so the regex doesn't match. I think you probably want to exclude all Unicode code points outside the 16-bit range \u0000 - \uFFFF. So this is probably what you want:

System.out.println("\uD800\uDC00".replaceAll("[^\u0000-\uFFFF]", "x"));
Peter Headland
  • 350
  • 1
  • 5
1

This substring

([\uD800-\uDBFF]&&['\uDC00'-'\uDFFF'])

does not appear in the String

\uD800\uDC00

so "x" is not replacing anything.

Code Whisperer
  • 1,041
  • 8
  • 16
  • Well replaceAll tries to find that substring in the String you provided. You need to provide a String that has the according substring if you want it to be replaced. replaceAll only works for String. Here is an example with replaceAll: [replaceAll](http://stackoverflow.com/questions/20556101/java-replace-all-in-a-string-with) – Code Whisperer Mar 13 '15 at 17:50
  • Yeah but I want to change the concatenation of these characters into an x, so I don't have to display that squarre – Bionix1441 Mar 13 '15 at 17:53
  • Your code will do that if the String you provide will have that concatenation. It does not have it. I think you are logically trying to pass it 2 ways that String might be written but you are doing it in one line. You need to do two checks one for each way the concatenation might be written. – Code Whisperer Mar 13 '15 at 17:59