Replace Unicode escapes with the corresponding character

Question

I'm trying to convert code points, such as \u00FC, to the character it represents.

import javax.swing.JOptionPane;

public class Test {
    public static void main(String[] args) {
        String in = JOptionPane.showInputDialog("Write something in here");
        System.out.println("Input: " + in);
        // Do something before this line
        String out = in;
        System.out.print("And Now: " + out);
    }
}

An example to explain what I mean:

First Console line: Input: Hall\u00F6

Second Console line: And Now: Hallö

EDIT: Because sometimes it didn't work with multiple Unicodes in The Trombone Willy's answer, here is the Code fixed:

public static String unescapeUnicode(String s) {
    StringBuilder r = new StringBuilder();
    for (int i = 0; i < s.length(); i++) {
        if (s.length() >= i + 6 && s.substring(i, i + 2).equals("\\u")) {
            r.append(Character.toChars(Integer.parseInt(s.substring(i + 2, i + 6), 16)));
            i += 5;
        } else {
            r.append(s.charAt(i));
        }
    }
    return r.toString();
}

Well if you enter "Hall\u00F6" when my code starts, it will also Write "Hall\u00F6" to the console both times, but I want that the second time it gives me "Hallö" because "\u00F6" is the unicode of "ö" — LeWimbes, May 28 '16 at 18:13
You'd need to explicitly parse those out. Escape sequences like `\uXXXX` are only in Java source code and don't exist in the console. [This lightly touches on it](http://stackoverflow.com/questions/1327355/is-there-a-java-function-which-parses-escaped-characters) — Obicere, May 28 '16 at 18:16

score 6 · Accepted Answer · edited May 14 '22 at 15:57

Joao's answer is probably the simplest, but this function can help when you don't want to have to download the apache jar, whether for space reasons, portability reasons, or you just don't want to mess with licenses or other Apache cruft. Also, since it doesn't have very much functionality, I think it should be faster. Here it is:

public static String unescapeUnicode(String s) {
    StringBuilder sb = new StringBuilder();

    int oldIndex = 0;

    for (int i = 0; i + 2 < s.length(); i++) {
        if (s.substring(i, i + 2).equals("\\u")) {
            sb.append(s.substring(oldIndex, i));
            int codePoint = Integer.parseInt(s.substring(i + 2, i + 6), 16);
            sb.append(Character.toChars(codePoint));

            i += 5;
            oldIndex = i + 1;
        }
    }

    sb.append(s.substring(oldIndex, s.length()));

    return sb.toString();
}

I hope this helps! (You don't have to give me credit for this, I give it to public domain)

score 4 · Answer 2 · answered May 28 '16 at 18:26

4

Try this:

StringEscapeUtils.unescapeJava("Hall\\u00F6")

answered May 28 '16 at 18:26

Joao Esperancinha

751
6
14

1

In which API do we find this class? What does it do exactly? A little explaination does not hurt here. – Yassin Hajaj May 28 '16 at 18:27
It is found on commons-lang: org.apache.commons commons-lang3 ${commons.lang3.version} . – Joao Esperancinha May 28 '16 at 18:28
essentially it unescapes any Java String Literal which includes unicode string literals. Check the api here [apache-commons-lang3](https://commons.apache.org/proper/commons-lang/javadocs/api-3.4/org/apache/commons/lang3/StringEscapeUtils.html#unescapeJava%28java.lang.String%29) – Joao Esperancinha May 28 '16 at 18:34
and you can find this in Maven Repository [here](http://mvnrepository.com/artifact/org.apache.commons/commons-lang3). – Joao Esperancinha May 28 '16 at 18:40
Thank you, too. But because of the licenses and exporting stuff I will use the code from the other answer for now. – LeWimbes May 28 '16 at 18:59
The preferred version of `StringEscapeUtils` is now in commons-text, not commons-lang3. – Donal Fellows Jan 21 '19 at 10:58

Replace Unicode escapes with the corresponding character

2 Answers2