4

as titled, how do I convert an ASCII String to an UTF-8 String in Java?

Thanks!

Edit: My situation is really that I read in a Chinese String, and when I output it, it's all gibberish. I thought the problem might lie in the encoding. So, how do I properly convert the String from the gibberish to the proper language set characters?

Jason Ching
  • 1,037
  • 4
  • 19
  • 27
  • possible duplicate of [How to convert Strings to and from UTF8 byte arrays in Java](http://stackoverflow.com/questions/88838/how-to-convert-strings-to-and-from-utf8-byte-arrays-in-java) and http://stackoverflow.com/questions/285228/how-to-convert-utf-8-to-us-ascii-in-java – Kazekage Gaara Jun 23 '12 at 09:15
  • Can you expand what do you need? Because in java all strings is by default UTF-16. (thanks Jon Skeet for correction) – ilalex Jun 23 '12 at 09:17
  • 3
    @ilya: No, all strings are sequences of UTF-16 code units. – Jon Skeet Jun 23 '12 at 09:17
  • 1
    (But ilya's point about your question being unclear is correct.) – Jon Skeet Jun 23 '12 at 09:18

2 Answers2

4

There's no such thing as an "ASCII string" or a "UTF-8 string" in Java. By the time you've got a String object, it's just a sequence of UTF-16 code units. There's no record of whether it was originally decoded from a byte array using ASCII or UTF-8 to interpret the bytes.

Also note that UTF-8 is backward-compatible with ASCII, in that if you've got any valid sequence of bytes representing ASCII-encoded text, that's the same sequence of bytes that would be used to represent the same text in UTF-8.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
2

There's no such thing as ASCII strings or UTF-8 strings in Java. ASCII and UTF-8 are encodings - byte arrays.

You do not need to do any conversion to go from an ASCII encoding of a string to a UTF-8 encoding of a string. Any valid ASCII is also valid UTF-8 encoding for the same string. (The reverse is not true).

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452