Replace non english character in a string with utf-8 character in Android / Java

Question

I need to replace some non English characters into \u00 format.

Like: BetalingsMåde, so the questionable character is å which needs to be converted to \u00e5

I've tried everything even

updateRequest=updateRequest.replaceAll("[^\p{ASCII}]", "");

but this only removes the non English characters.

Also I need to send this request through POST with HTTP request (also tried

setRequestProperty("content-type","application/json;charset=utf-8");

with no luck, so if there is an answer with that also.

Thanks in advance!

See [Convert International String to \u Codes in java](http://stackoverflow.com/questions/6230190/convert-international-string-to-u-codes-in-java) — Robert, May 12 '16 at 14:20
@Robert Specifically this answer: http://stackoverflow.com/a/27359340/5221149 — Andreas, May 12 '16 at 14:33
Don't know what API you're using, but setting the `content-type` directly like that may not cause the API to actually serialize the text as UTF-8. You may have to call a specific method on the API to cause that to happen. — Andreas, May 12 '16 at 14:39

score 3 · Accepted Answer · answered May 12 '16 at 15:35

3

If you want to convert to a unicode escaped string you can do this:

org.apache.commons.lang3.StringEscapeUtils.escapeJava("Your string to escape");

It's part of the Apache Commons Lang 3 Package.

answered May 12 '16 at 15:35

Joao Esperancinha

751
6
14

1

Thanks, this answer saved the day! – ravenns May 13 '16 at 08:02

score 0 · Answer 2 · answered May 12 '16 at 14:31

In java String/char already contains Unicode text. However some things could have gone wrong. Having a messed up String always means the point of entry has to be corrected.

Hard coded strings in java source code need the same encoding for compiler and editor. Nowadays I would set the IDE's encoding hard to UTF-8.

Properties files are by default restricted ISO-8859-1, meaning one should use \uXXXX.

Files being read must be read with the encoding of the file specified. Often there is an overloaded method without encoding. And the old FileReader/FileWriter should not be used, they use the current platform encoding - not portable.

Texts from the database are merely problematic, if the database was wrongly defined, or the JDBC driver can communicate with another encoding.

I am not sure you want the following, which does a bit what the java2ascii tool does.

String toAscii(String s) {
    StringBuilder sb = new StringBuilder(s.length() * 9 / 10);

    for (int i = 0; i < s.length(); ++i) {
        int ch = (int) s.charAt(i);
        if (0 < ch && ch < 128) {
            sb.append(ch);
        } else {
            sb.append(String.format("\\u%04x", ch));
        }
    }
    return sb.toString();
}

More likely use setRequestProperty("content-type","text/json;charset=utf-8"); so the charset is indeed used (text). Or even more likely on the response, not the request.

Replace non english character in a string with utf-8 character in Android / Java

2 Answers2