3

So I have this simple code:

public class FooBar {
    public static void main(String[] args) {
        String foo = "ğ";
        System.out.println(foo.getBytes().length);
    }
}

And let me compile it and run it:

$ javac FooBar.java
$ java -Dfile.encoding=UTF-32 FooBar
4

Ok, I am not surprised that a character took 4 byes in a String, because I told Java to use UTF-32 encoding when running the program.

Lets try running the program with UTF-8 Encoding:

$ java -Dfile.encoding=UTF-8 FooBar
2

All seems fine.

Now currently the class file (FooBar.class) is 451 bytes. I will change the code like this:

public class FooBar {
    public static void main(String[] args) {
        String foo = "ğğ";
        System.out.println(foo.getBytes().length);
    }
}

compile it again, and see the length of the file in my disk to be: 453 bytes.

Obviously, the file itself is stored in the disk with UTF-8 encoding. If I run this .class file now with UTF-32 encoding:

$ java -Dfile.encoding=UTF-32 FooBar
8

Well all seems fine but, is there anyway to tell the compiler to encode the .class file using UTF-32 for String characters?

Koray Tugay
  • 22,894
  • 45
  • 188
  • 319
  • Probably this thread might be helpful: http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding – Konstantin Yovkov Jan 21 '16 at 11:05
  • @KonstantinYovkov How is it even related? My question is about compile-time, that question is about run-time. – Koray Tugay Jan 21 '16 at 11:06
  • You read it very quickly :) One of the answers suggests that you can set a default character encoding, by setting the `JAVA_TOOL_OPTIONS` environment variable to `-DfileEncoding=UTF-32` – Konstantin Yovkov Jan 21 '16 at 11:08
  • Where do I find JAVA_TOOL_OPTIONS in osx? – Koray Tugay Jan 21 '16 at 11:09
  • Probably you have to create an environment variable with the same name and set its value. Play a bit with this, and research about this `JAVA_TOOL_OPTIONS` more carefully - at the least, I just suggested what you could do and can't tell if this will work 100%. – Konstantin Yovkov Jan 21 '16 at 11:12

1 Answers1

3

The system property file.encoding determines the default charset but is not used by the compiler.

Java class files have a defined binary data structure which cannot be changed (except you write your own compiler and classloader).

Therefore the encoding of strings in the constant pool is always modified UTF-8.

wero
  • 32,544
  • 3
  • 59
  • 84
  • Hmm I see. So it converts the modified UTF-8 to UTF-32 in my case at runtime then? – Koray Tugay Jan 21 '16 at 11:18
  • @KorayTugay actually when the class is loaded it generates a String object from the constant pool entry (converting UTF-8 to UCS-2) and when you call `String.getBytes()` it converts the string to UTF-32 bytes using your default encoding. – wero Jan 21 '16 at 11:21