Config: Windows 8 English operating system; JDK1.7; Eclipse.
I installed a software written by a Chinese, and the GUI is Chinese characters. But the software displays ugly with square boxes. I searched the internet and found a method to fix it. In the control panel of Win8, set "language for non-Unicode Programs" to be "Chinese".
But problem arises when writing code in Eclipse. We know Java itself uses two byte Unicode to store char
and String
. But when I execute the following code:
import java.util.Arrays;
import java.nio.charset.Charset;
public class CharSetTest {
public static void main(String[] args) throws Exception {
System.out.println(Charset.defaultCharset());
String s = "哈哈";
byte[] b3 = s.getBytes("UTF-8");
System.out.println(b3.length);
System.out.format("%X %X %X\n", b3[0],b3[1],b3[2]);
System.out.println(new String(b3));
byte[] b4 = s.getBytes();
System.out.format("%X %X %X\n", b4[0],b4[1]);
}
}
The output is weird:
GBK //default charset is GBK, not Unicode or UTF-8
3 //this is obvious since a Chinese character is encoded into 3 bytes
E5 93 88 //this is corresponding UTF-8 code number
鍝? //something wrong here
B9 FE //I think s.getBytes() should use JAVA's default encode "Unicode", but NOT is this case
Several questions:
- What is Java default charset? Is it Unicode? How Java default charset interact with programmers? For example, if Java use Unicode, then a string "abc" cannot be encoded into other charset since they are different from Unicode like charset for Russia, Frence etc, since they are totally different encoding method.
- What does
Charset.defaultCharset()
return? Does it return my Windows 8's default charset? - How does
Charset.defaultCharset()
return GBK? I didn't set anything in my Windows 8 related default charset except the one for "language for non-Unicode Programs" in control panel. - If I declare a String in Java like this:
String str = "abc";
, I don't know the process of charset/encoding. I firstly need to input the Java statement by keyboard. How the keyboard translates my key button into Java Unicode charset? The String str is stored in my .java source code file. What is the charset to store Java source code?
EDIT:
Why does we say "Java use Unicode to represent char and String"? In my Java program, when should I care about the Unicode thing?
Usually, I only need to care about encoding/decoding with UTF-8 ISO-8859-1 GBK etc. But I never care about Unicode representation of char and String. So how and when should I use the Unicode?