0

I have UTF-8 literals like this:

String literal = "\x6c\x69b/\x62\x2f\x6d\x69nd/m\x61x\x2e\x70h\x70";

I need to read them and convert them into plain text.

Is there an import in java that can interpret these?

Thank you.

Kyle
  • 3,004
  • 15
  • 52
  • 79
  • This constant is imposible - there is no escape character like "\x". Unicode string looks like "\\u0444\\u044B\\u0432". – svaor Nov 04 '11 at 05:25
  • What real text is coded in your literals? – svaor Nov 04 '11 at 05:26
  • http://www.utf8-chartable.de/unicode-utf8-table.pl?names=-&utf8=string-literal&unicodeinhtml=hex look at the literals they do exist – Kyle Nov 04 '11 at 05:30
  • I mean - just try to compile your code. – svaor Nov 04 '11 at 05:35
  • Oh, in my real application problem this text isn't in the program. I just tried to make it short and sweet for stack. I'm trying to figure out a conversion function to start. I can't create one. This text is inside text files in a directory in which I planned to recursively read, convert text, and save as filename.cleaned.txt – Kyle Nov 04 '11 at 05:38
  • Ok, I see. Look at perl unescapes at http://stackoverflow.com/questions/3537706/howto-unescape-a-java-string-literal-in-java – svaor Nov 04 '11 at 05:45
  • 1
    @Kyle - the best thing to do (IMO) is to ask that as a new Question. And take the time to make sure that you are asking your question clearly using correct Java terminology. (You are NOT asking about Java literals, and you are not asking about Java imports ... so don't use those terms.) – Stephen C Nov 04 '11 at 05:46
  • Apologies Stephen. Thank you for being polite with this matter. I will redo :) – Kyle Nov 04 '11 at 05:52

2 Answers2

4

Java doesn't support UTF-8 literals per se. Java's linguistic support for Unicode is limited to UTF-16 based Unicode escapes.

You can express your UTF-8 characters in a String literal with Unicode escapes as follows:

String literal = 
    "\u006c\u0069b/\u0062\u002f\u006d\u0069nd/m\u0061x\u002e\u0070h\u0070";

(Assuming no typing errors ...)

or you could (in this case) replace the escapes with normal ASCII characters.

Note that the conversion from UTF-8 to UTF16 is not normally that simple. (It is simple in this case because the \xnn characters are all less than 0x80, and therefore each one represents a single Unicode code point / unit.)


Another approach is to represent the UTF-8 as an array of bytes, and convert that to a String; e.g.

byte[] bytes = new byte[]{
    0x6c, 0x69, 'b', '/', 0x62, 0x2f, 0x6d, 0x69, 'n', 'd', 
    '/', 'm', 0x61, 'x', 0x2e, 0x70, 'h', 0x70};
String str = new String(bytes, "UTF-8");

(Again, assuming no typing errors.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • I've made a mistake to my goal. String literal doesn't exist in my application. I have a directory of text files with the unicode text inside of them. I would like to decode them into readable format, or just letters. I was going to create a decryption function to read and save the converted text but, I don't know how to start. – Kyle Nov 04 '11 at 05:44
  • @Kyle see my comment on the Question, – Stephen C Nov 04 '11 at 05:48
  • How can i programmatically convert \xnn characters to their String representation in Java? – Alp Mar 20 '13 at 08:20
  • 1
    By writing some Java code that loops over the characters in the input string looking for the escape sequences ... and turns them into characters for the output string. It is just programming. – Stephen C Mar 20 '13 at 13:09
1

If you have the characters in a file to be read, you can use InputStreamReader to convert from whatever charset the string is in to a sequence of char:

InputStream is = ...; // get the input stream however you want
InputStreamReader isr = new InputStreamReader(is, "charset-name");
Matthew Cline
  • 2,312
  • 1
  • 19
  • 36