0
byte[] bytes = new byte[] { 1, -1 };
System.out.println(Arrays.toString(new String(bytes, "UTF-8").getBytes("UTF-8")));
System.out.println(Arrays.toString(new String(bytes, "ISO-8859-1").getBytes("ISO-8859-1")));

output:

[1, -17, -65, -67]
[1, -1]

why???

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
seven
  • 35
  • 7
  • http://stackoverflow.com/questions/2544965/why-new-stringbytes-enc-getbytesenc-does-not-return-the-original-byte-array – Bozho Apr 14 '10 at 05:54

3 Answers3

6

Your byte array isn't a valid UTF-8-encoded string... so the string you get from

new String(bytes, "UTF-8")

contains U+0001 (for the first byte) and U+FFFD to signify bad data in the second byte. When that string is encoded using UTF-8, you get the byte pattern shown.

Basically you shouldn't try to interpret arbitrary binary data as if it were encoded in a particular encoding. If you want to represent arbitrary binary data as a string, use something like base64.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • thanks Jon. But I am not familiar with base64, how does base64 support all byte in case lossing data? – seven Apr 19 '10 at 05:38
  • @seven: I'm not sure exactly what you mean - but it converts opaque binary data to just ASCII, which is generally easy to transport. – Jon Skeet Apr 19 '10 at 06:38
  • Is it possible some bytes, which are not included in ASCII Alphabet, can not convert to ASCII? thanks. – seven Apr 22 '10 at 05:01
  • @seven: No, the whole point of base64 is that it takes *any* set of bytes and converts it to ASCII. That's why it ends up being longer (4 characters for every 3 bytes). – Jon Skeet Apr 22 '10 at 05:18
2

-1 is not a valid UTF-8 encoded character. [-17, -65, -67] is most likely the byte representation of the replacement character that gets substituted.

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
0

String isn't a container for binary data. It is a container for char. -1 isn't a legal value for a char. There's no reason why what you're doing should ever work. Ergo, don't do it.

user207421
  • 305,947
  • 44
  • 307
  • 483