I have UTF-8 literals like this:
String literal = "\x6c\x69b/\x62\x2f\x6d\x69nd/m\x61x\x2e\x70h\x70";
I need to read them and convert them into plain text.
Is there an import in java that can interpret these?
Thank you.
I have UTF-8 literals like this:
String literal = "\x6c\x69b/\x62\x2f\x6d\x69nd/m\x61x\x2e\x70h\x70";
I need to read them and convert them into plain text.
Is there an import in java that can interpret these?
Thank you.
Java doesn't support UTF-8 literals per se. Java's linguistic support for Unicode is limited to UTF-16 based Unicode escapes.
You can express your UTF-8 characters in a String literal with Unicode escapes as follows:
String literal =
"\u006c\u0069b/\u0062\u002f\u006d\u0069nd/m\u0061x\u002e\u0070h\u0070";
(Assuming no typing errors ...)
or you could (in this case) replace the escapes with normal ASCII characters.
Note that the conversion from UTF-8 to UTF16 is not normally that simple. (It is simple in this case because the \xnn characters are all less than 0x80, and therefore each one represents a single Unicode code point / unit.)
Another approach is to represent the UTF-8 as an array of bytes, and convert that to a String; e.g.
byte[] bytes = new byte[]{
0x6c, 0x69, 'b', '/', 0x62, 0x2f, 0x6d, 0x69, 'n', 'd',
'/', 'm', 0x61, 'x', 0x2e, 0x70, 'h', 0x70};
String str = new String(bytes, "UTF-8");
(Again, assuming no typing errors.)
If you have the characters in a file to be read, you can use InputStreamReader to convert from whatever charset the string is in to a sequence of char
:
InputStream is = ...; // get the input stream however you want
InputStreamReader isr = new InputStreamReader(is, "charset-name");