4

I need to figure out the way to convert a Unicode value to escaped code. For example, convert 0x1f604 to "\uD83D\uDE04".

halfer
  • 19,824
  • 17
  • 99
  • 186
Sami Issa
  • 1,695
  • 1
  • 18
  • 29
  • If I use StringEscapeUtils.escapeJava(String) I can get the escaped code, but I need get it from an Integer value like the above example. – Sami Issa Jan 04 '17 at 14:01
  • Maybe look at this: http://stackoverflow.com/questions/1615559/convert-a-unicode-string-to-an-escaped-ascii-string – Danieboy Jan 04 '17 at 14:03
  • Can you be more specific about the escaping you're looking for (as there are several ones)? In what environment does it need to be valid? And what's the input for the encoding? 32-bit Unicode codepoints? – Codo Jan 04 '17 at 14:07

1 Answers1

3

It seems you're looking for an escaping that first converts a Unicode codepoint (32-bit integer value) to UTF-16 encoding (multiple 16-bit values), which is the encoding Java uses internally for strings.

Then each 16-bit value uses an escaping syntax as in Java or Javascript.

public static String encodeCodepoint(int codePoint) {

    char[] chars = Character.toChars(codePoint);
    StringBuilder sb = new StringBuilder();
    for (char ch : chars) {
        sb.append(String.format("\\u%04X", (int)ch));
    }
    return sb.toString();
}

The following code:

System.out.println(encodeCodepoint(0x1f604));

outputs:

\uD83D\uDE04
Codo
  • 75,595
  • 17
  • 168
  • 206
  • Many thanks @Codo. I'll check it and let you know. Thanks in advance!! – Sami Issa Jan 04 '17 at 14:24
  • your code works perfect but now I have another problem. The result of encodeCodepoint(int codePoint) is used to find and replace the escaped unicode, by other code. Example: String text = "\uD83D\uDE04"; text.replace(encodeCodepoint(0x1f604), "<1f604>"); -> but doesn't replace it!!! Any ideas?? Thanks a lot!!! – Sami Issa Jan 04 '17 at 14:59
  • 1
    Just an idea: If you inspect a string with an emoji in a debugger, it will show "\uD83D\uDE04". That doesn't mean it really contains escaped unicode data. That's just the debugger's way of displaying it. – Codo Jan 04 '17 at 16:46
  • Terrific, simple code. How would you perform the inverse operation, i.e. unescaping of your method's output? – Bliss Jun 19 '19 at 19:29