I have following code:
public static void main(String args[]) throws UnsupportedEncodingException {
System.setProperty("file.encoding", "gbk");
String name = "こんにちわ";
String copy = new String(name.getBytes("utf-8"));
byte[] b1 = name.getBytes("utf-8");
byte[] b2 = copy.getBytes();
System.out.println("b1: " + Arrays.toString(b1));
System.out.println("b2: " + Arrays.toString(b2));
}
The console output is:
b1: [-29, -127, -109, -29, -126, -109, -29, -127, -85, -29, -127, -95, -29, -126, -113]
b2: [-29, -127, -109, -29, -126, -109, -29, -127, -85, -29, -127, -95, -29, -126, 63]
Note the last byte is different in the new String.
Now, if I use the input String name = "こんにち";
(just 4 Japanese Characters) instead, it changes to:
b1: [-29, -127, -109, -29, -126, -109, -29, -127, -85, -29, -127, -95]
b2: [-29, -127, -109, -29, -126, -109, -29, -127, -85, -29, -127, -95]
This time the bytes are exactly same.
I use java jdk1.6.0_45 on windows. Default charset is gbk
.
Did I meet some encoding limitations?