UCS-2 unknown characters

Question

From below link i can see some unknown characters of UCS-2. What are those? Why are those unknown? So we cannot decode them?

http://www.columbia.edu/kermit/ucs2.html

Basically user is sending an ucs-2, dcs 8 message to our router. But when i decode it, then i am getting some junk characters. Ex: D83E DD13 --> this is printed as ? or some junk, how to print and view them in proper value in text file.

Thanks & regards, Ashwini

If you google for "0xD83E 0xDD13" you can see its an emoji in UTF-16, if that's whats being sent then its not representable in UCS-2 — Alex K., Apr 10 '19 at 12:00
As in GSM? There isn't one. You can convert UCS-2 to UTF-16. — Alex K., Apr 10 '19 at 12:33
In GSM, we r recieving dcs=8 (which is ucs-2) alond with this encoded value D83E DD13, we writing this in a text file using unicode converter, but its writing junk. Any idea how to write those emoji's in text file, is it possible? — ashu, Apr 11 '19 at 06:26
Your input is UCS-2-BE (Big Endian) make sure your converting your input to UTF-16 from that as opposed to UCS-2-LE. Make sure your viewing the converted text file in an editor with the text encoding set to UTF-16 (again there are BE/LE variants) & ensure the font your using has a character for that emoji. — Alex K., Apr 11 '19 at 10:12
How to identify the characters fall in unicode range 0x0000 to 0xFFFF using java? — ashu, Apr 12 '19 at 09:56
@ashu a Java `char` is 16bit, so ALL `char` values fall within the 0x0000-0xFFFF range. What you really need to ask is whether a given `char` represents a UTF-16 surrogate for a Unicode codepoint that is outside of the UCS-2 range (see [What is a "surrogate pair" in Java](https://stackoverflow.com/questions/5903008/)). You can use `Character.is(High|Low)Surrogate()` to test if a `char` is a UTF-16 surrogate or not. Codepoints that don't use surrogates are the same in both UCS-2 and UTF-16, Codepoints that require surrogates do not exist in UCS-2. — Remy Lebeau, Apr 12 '19 at 21:45
@RemyLebeau I will have the string which will have 16bit char and utf-16 surrogate chars, so i need to trim all those high/low surrogate chars. I am using the below function, Is it correct? Please advice. str.replaceAll( "([\\ud800-\\udbff\\udc00-\\udfff])", ""); — ashu, Apr 15 '19 at 06:22
@RemyLebeau or is this feasible StringBuffer finalStr = new StringBuffer(); char[] chars = str.toCharArray(); for(int i=0;i — ashu, Apr 15 '19 at 06:33
@ashu you don't need to specify the two surrogate ranges separately in the regular expression, they are sequential, so a single range will suffice: ```str.replaceAll("[\uD800-\uDFFF]", "");``` — Remy Lebeau, Apr 15 '19 at 15:23
@ashu if you use the `StringBuffer` approach, you can simplify the loop by using `isSurrogate()` which tests for both high and low. And you don't need the `char[]` at all: `StringBuffer finalStr = new StringBuffer(); for(int i = 0; i < str.length(); i++){ char ch = str.charAt(i); if (!Character.isSurrogate(ch)) { finalStr.append(ch); }}` — Remy Lebeau, Apr 15 '19 at 15:26
@RemyLebeau I finally found solution as str.replaceAll("[^\u0000-\uffff]", ""); Basically if character doesnt fall under basic multilingual plane, then i am replacing it with empty character.The solution provided str.replaceAll( "([\\ud800-\\udfff])", ""); not working. Input given String str = "heéaà"; output : ?heéaà. Its not replacing. — ashu, Apr 16 '19 at 06:29
Does this answer your question? [What is a "surrogate pair" in Java?](https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java) — JosefZ, Nov 08 '20 at 21:59

UCS-2 unknown characters

0 Answers0