1

I'm trying to write a java code in which I have utf8 string which contains an emoji, I want to replace that emoji with a text. for example:

I have this text: طلبت منهم مبالغ كبيرة لإتمام دراستهم

and I want it to be like this: grinningFace طلبت منهم مبالغ كبيرة لإتمام دراستهم

I tried this:

String string = "";
    try {

        byte[] utf8Bytes = string.getBytes("UTF-8");

        string = new String(utf8Bytes, "UTF-8");
    } catch (
        UnsupportedEncodingException e
    ) {
        e.printStackTrace();
    }
    string=string.replaceAll("[\u1F600]", "grinningF");
    //also tried "\u1F600" and "u1F600"
    System.out.println(string);

but it didn't work, how to do it?

Gholamali Irani
  • 4,391
  • 6
  • 28
  • 59
Lama
  • 69
  • 1
  • 1
  • 10
  • 1
    I strongly suspect you don't understand what UTF-8 is... it's not clear why you expected that code to convert an emoji into the text "grinningFace". Additionally, in Java there's no such thing as "a UTF-8 string"... there's just a string which is a sequence of UTF-16 code units. – Jon Skeet Dec 09 '17 at 17:26
  • check this one https://stackoverflow.com/questions/34802721/is-there-anyway-to-convert-emoji-to-text-in-java – Arvind Katte Dec 09 '17 at 17:28
  • @JonSkeet then how to convert the emoji to the grinningface? :) – Lama Dec 09 '17 at 17:31
  • 1
    You'd need a mapping from each emoji to its name - I don't expect that to be anywhere within Java itself. You should be able to use `String.replace` with the right sequence though - I'd expect it to be `string = string.replace("\uD83D\uDE00", "grinningFace");` – Jon Skeet Dec 09 '17 at 17:32
  • @JonSkeet Worked! From where did you got this "\uD83D\uDE0" ? – Lama Dec 09 '17 at 17:40
  • @Lama: I used http://csharpindepth.com/Articles/General/Unicode.aspx#explorer - the "Unicode explorer" lets you paste text in, then see the Unicode code point, UTF-16 code units and UTF-8 bytes of each character. – Jon Skeet Dec 09 '17 at 17:44
  • BTW—Your `try` block doesn't do anything and can't even throw UnsupportedEncodingException because UTF-16 and UTF-8 are both encodings for the full Unicode character set. – Tom Blodget Dec 10 '17 at 16:49
  • @JonSkeet Regarding _"You'd need a mapping from each emoji to its name"_, that emoji-to-Unicode-name mapping does exist in Java; just call `Character.getName()` with a code point. For example, `System.out.println(Character.getName("".codePointAt(0)));` prints "GRINNING FACE". – skomisa Dec 20 '19 at 20:14
  • Voting to reopen. This question was closed as a duplicate of a question that specifically required a regex: (_"What is the regex to extract all the emojis from a string?"_), but there is no need to use a regex to answer this question at all, and it doesn't mention requiring a regex as the solution. – skomisa Dec 21 '19 at 06:55

1 Answers1

8

Your problem is that the Emoji in not in the Basic Multilingual Plane because its code is greater than U+FFFF. Java characters are only 16 bits long, so only characters in the BMP use one single java character. Characters outside the BMP are encoded in UTF16.

The unicode Emoji is the GRINNING FACE character U+1F600. Its utf8 encoding is 0xf0,0x9f,0x98,0x80, and its UTF-16 encoding is (as said by Jon Skeet in its comment) 0xd83d, 0xde00. That means that the internal Java representation of "" is "\ud83d\ude00", as a debugger could show.

So your code should be:

string.replaceAll("\ud83d\ude00", "grinningF");

or

string.replaceAll("", "grinningF");

which is exactly the same.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • as @JonSkeet mentioned in the comment, this is the right answer string.replaceAll("\ud83d\ude00", "grinningF");. – Lama Dec 12 '17 at 21:03
  • and this string.replaceAll("", "grinningF"); doesn't work. – Lama Dec 12 '17 at 21:03
  • @Lama this might happen if your Java compiler treats Java sources as ASCII-encoded. Check encoding in compiler settings. – izogfif Aug 09 '22 at 13:23