Android some unicode values (emoticons) not recognized

Question

I'm using this code above to show some text which contains emoticons in an EditText:

EditText et = (EditText) findViewById(R.id.myeditext);
et.setText(StringEscapeUtils.unescapeJava("This is a text with emoji \u263A"));

This shows me the text I wrote and a smiley emoticon or sth.

BUT if I put another value instead of \u263A, for example \u1F60A, it doesn't work. It shows sth like the image in this question here:

Unicode character (U+1FXYZ) not outputting correctly when used in code-behind

Does anyone know how to handle this? Thank you.

UPDATE

How can I use the answer given below, or even the answer that is given in the supposed duplicate question, when the string that contains unicodes is random?

This is a pseudo code of what I want to achieve:

for ( eachFbComment as (String) randomString ) {
    //randomString example: "This is a text with emoji \u263A, string countinues here with another emoji \u1F60A, and a last emoji here \u263A! "
    print (randomString); // Here I want to display the text + emojis instead of unicode characters.
}

Are you serious? You have already answered your question by the link above. And searching on stack overflow gives you an answer for java within 10 seconds: http://stackoverflow.com/questions/9834964/char-to-unicode-more-than-uffff-in-java?rq=1 — chuhx, Jun 14 '16 at 12:34
@user1992 Please give a clear example of what you mean by "unicodes is random". It's very unclear what you mean here. How is the unicode random? Is it typed in by the user? Is it selected from a list of emoticons? Please show a representative example that demonstrates *precisely* what you're trying to accomplish. — Michael Gaskill, Jun 14 '16 at 22:59
@MichaelGaskill I didn't say that unicode is random but the string that contains unicodes is random. For example, if I want to display the comments that are in a facebook post. These comments contains random strings that might include emoticons. In such a case, supposing that I have the required mechanism to get these comments, how can I use the answer given below when unicodes are part of a more general string? How can I convert any single unicode in the way described in the accepted answer? Thank you. — user19922, Jun 15 '16 at 09:57

Joop Eggen · Accepted Answer · 2016-06-15T14:19:52.327

1

The \uXXXX is for 4 hexadecimal digits, 16 bits Unicode. Some (not java) languages use capital \UXXXXXXXX (\U0001F60A). You can use:

String emoji = new String(new int[] { 0x1F60A }, 0, 1);

This uses a code point array of just one code point.

et.setText("This is a text with emoji " + emoji);

Whether the emoji is shown depends on the font.

After UPDATE in question:

Case: the string contains a backslash, 'u' and 4 to 5 hexadecimal digits.

String s = "This is with \\u263A, continuing with another \\u1F60A, and \\u263A!";

Note that in java "\u1F60A" would be two code points, for '\u1F60' and for 'A'. So the above is a self-made convention, just similar to the Unicode u-escaping of java. One sees exactly the raw \u1F60A.

To translate s into a full Unicode string:

Pattern pattern = Pattern.compile("\\\\u([0-9A-Fa-f]{4,5})\\b");
StringBuffer sb = new StringBuffer();
Matcher m = pattern.matcher(s);
while (m.find()) {
    int cp = Integer.parseInt(m.group(1), 16);
    String added = cp < 0x10000
        ? String.valueOf((char) cp)
        : new String(new int[] { cp }, 0, 1);
    m.appendReplacement(sb, added);
}
m.appendTail(sb);
s = sb.toString();

edited Jun 15 '16 at 14:19

answered Jun 14 '16 at 12:30

Joop Eggen

107,315
7
83
138

It works but how can I do this when the text is random, not explicitly given as above. How can I do this with a random text. I mean how can I represent `\uXXXX` as `0xXXXX` in a text that I don't know what it contains? – user19922 Jun 14 '16 at 12:44
Java holds text internally as Unicode, so the only problem is by inputing a file (= binary data). A text file must be in a Unicode format, like UTF-8, and the reading in java should then specify "UTF-8". Editing poses no problem (on the programming side). – Joop Eggen Jun 14 '16 at 12:57
Thank you but I think you missed understood my comment. Here it is a better explanation: " For example, if I want to display the comments that are in a facebook post. These comments contains random strings that might include emoticons. In such a case, supposing that I have the required mechanism to get these comments, how can I use the answer given below when unicodes are part of a more general string? How can I convert any single unicode in the way described in the accepted answer?" – user19922 Jun 15 '16 at 10:00
Could you help me understand the issue yet a bit more clear? So you have a String with emoticons (real Unicode code point like U+1F60A). Do you want to find the emoticons? `str.codePoints.filter((cp) -> Character.UnicodeBlock.of(cp).equals(Character.UnicodeBlock.EMOTICONS))).count()` or such. – Joop Eggen Jun 15 '16 at 10:26
Please have a look to the question update, there is a pseudocode of what I want to do. Hope it is more clear now, thank you! – user19922 Jun 15 '16 at 12:13
I hope I did now understand the case, otherwise have patience with me. Complicated issue, like instead of writing `class` in a java source one may write `\u0063lass". – Joop Eggen Jun 15 '16 at 13:00
I tried your updated answer and it is really close to the solution. But, it failed with this string `\ud83c\udf4e\ud83c\udf4e\ud83c\udf4e\ud83c\udf4e`. Here, `\ud83c\udf4e` is an emoji, so there are 4 emojis (the same emoji repeated 4 times) which follow each other without space. But when I display the string I get: `(emoji)?(emoji)?`. So I only get the 1st and 3rd emoji. The 2nd and 4th are not recognized. Any idea about this? P.s. So thank you so much man :) – user19922 Jun 15 '16 at 14:02
Then that is probably a code point represented as 2 UTF-16 chars, a so-called surrogate pair. I'll try to add that case. – Joop Eggen Jun 15 '16 at 14:15
It's getting closer and closer... When I use this `print(StringEscapeUtils.unescapeJava(s))` to make the print, the last problem still is. When I use this `print(s)` the last problem is solved. But this string `\u1F608\u1F620\uDE20` (3 emojis here) fails with both of them. The 1st one shows `(emoji)?(emoji)`, the 2nd one `(emoji)(emoji)?`. – user19922 Jun 15 '16 at 14:41
http://www.fileformat.info/info/unicode/char/de20/index.htm mentions that U+DE20 is not valid Unicode. So the 2nd was right. (If that site was right.) – Joop Eggen Jun 15 '16 at 15:22
this code worked for me when an Unicode and string value had a space in between them (eg : \ude08 yo) when the space was removed (eg :\ude08yo)the Unicode was not identified as a string and both of them displayed as string ,How can i solve this issue ? – Ameen Maheen Mar 08 '18 at 07:37

Android some unicode values (emoticons) not recognized

1 Answers1