0

How to put Unicode char U+1F604 in Java String? I attempted using

String s = "\u1F604";

but it equivalent to

String s = "\u1F60"+"4";

it was split into 2 chars.

Yeezh
  • 141
  • 1
  • 7
  • There are several existing questions on SO which address you concern. As well as the linked duplicate, some answers to the question [Manually converting unicode codepoints into UTF-8 and UTF-16](https://stackoverflow.com/q/6240055/2985643) provide very helpful explanations of how encoding to UTF-8 and UTF-16 works. – skomisa Jul 09 '21 at 22:56

5 Answers5

9

DuncG's answer is a good way of doing it. The short explanation for this is that Unicode characters, by default, only take up 4 bytes, so the string literal escape only allows \u####. However, emojis are surrogate pairs and Unicode has reserved U+D800 to U+DFFF for these pairs, allowing 1024 x 1024 pair characters.

A different way of doing it that doesn't require converting into UTF-16 and encoding as a surrogate pair is to use Character.toChars(...):

public class Main {
	public static void main(String[] args) {
		String s = "Hello " + new String(Character.toChars(0x1f604)) + "!";
		System.out.println(s);
	}
}

Try it online!

hyper-neutrino
  • 5,272
  • 2
  • 29
  • 50
6

The third variant, especially Character.toString(0x1f604):

public class Main {
  public static void main(String[] args) {
    String s1 = "Hello " + Character.toString(0x1f604) + "!"; // Since Java 11
    String s2 = "Hello " + new String(new int[]{0x1f604}, 0, 1) + "!"; // < 11
    System.out.println(s1 + " " + s2);
  }
}

(Notice that in some other languages \U0001f604 might be used. In java \u and \U are the same.)

Community
  • 1
  • 1
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
2

The UTF-16 encoding of your character U+1F604 is 0xD83D 0xDE04, so it should be:

String s = "\uD83D\uDE04";
DuncG
  • 12,137
  • 2
  • 21
  • 33
  • 3
    This is not a good answer for several reasons: [1] You haven't explained or shown how to determine the UTF-16 encoding from the code point. You've just presented it as a _fait accompli_. [2] This is not a general solution. You have hard-coded the solution for that specific codepoint only. [3] Since Java 11 there is a better and simpler approach, as shown in the answer from [Joop Eggen](https://stackoverflow.com/a/68258582/2985643), which can easily be tweaked to work for any codepoint in any plane. – skomisa Jul 09 '21 at 22:44
0

You can add this UTF-16 smiley face symbol to the string as a symbol itself, as a hexadecimal or decimal surrogate pair, or its supplementary code point.

// symbol itself
String str1 = "";
// surrogate pair
String str2 = "\uD83D\uDE04";
// surrogate pair to its supplementary code point value
int cp = Character.toCodePoint('\uD83D', (char) 0xDE04);
// since 11 - decimal codepoint to string
String str3 = Character.toString(cp);
// since 11 - hexadecimal codepoint to string
String str4 = Character.toString(0x1f604);

// output
System.out.println(str1 + " " + str2 + " " + str3 + " " + str4);

Output:

   
  • But if you do this I think you have to compile it with a command line flag, see https://stackoverflow.com/a/25541706/751579 – davidbak Feb 07 '23 at 03:00
0

If you have a string representation of a hexadecimal value of a character, you can read a numeric value using Integer.parseInt method.

// surrogate pair
char high = (char) Integer.parseInt("D83D", 16);
char low = (char) Integer.parseInt("DE04", 16);
String str1 = new String(new char[]{high, low});

// supplementary code point
int cp = Integer.parseInt("1F604", 16);
char[] chars = Character.toChars(cp);
String str2 = new String(chars);

// since 11
String str3 = Character.toString(cp);

// output
System.out.println(str1 + " " + str2 + " " + str3);

Output: