Why .class
is UTF-8, but runtime .class
is UTF-16?
-
Why not ? What if the JVM need UTF-16 but the compiled file doesn't ? What is the problem ? – AxelH Dec 30 '16 at 09:28
-
1@AxelH Relax. He just wants to know why would JVM need UTF-16 rather than UTF-8. This is legitimate! – AhmadWabbi Dec 30 '16 at 09:31
-
@AhmadWabbi I didn't say his question was stupid ... I just point some idea... – AxelH Dec 30 '16 at 10:05
-
1Possible duplicate of [Why Java char uses UTF-16?](http://stackoverflow.com/questions/36236364/why-java-char-uses-utf-16) – Joe Dec 30 '16 at 10:21
3 Answers
Why .class is UTF-8
For classes written for a Western audience, which are usually mostly ASCII, this is the most compact encoding.
but runtime .class is UTF-16?
At runtime it's quicker to manipulate strings that use a fixed-width encoding (Why Java char uses UTF-16?), so UCS-2 was chosen. This is complicated by the change from UCS-2 to UTF-16 making this another variable-width encoding.
As noted in the comments of that question, JEP 254 allows for the runtime representation to change to something more space efficient (e.g., Latin-1).

- 29,416
- 12
- 68
- 88
-
*At runtime it's quicker to manipulate strings that use a fixed-width encoding* => was the encoding actually fixed-width when it was introduced, or was there already a notion of graphemes requiring multiple code points? – Matthieu M. Dec 30 '16 at 12:17
-
1Sort of; despite UTF-16 supporting variable-width encoding from its first appearance in Unicode 2.0 in 1996 (http://www.unicode.org/faq/utf_bom.html), UTF-16 was effectively fixed-width until Unicode 3.1 in 2001, which was only supported in J2SE 5 (http://www.oracle.com/us/technologies/java/supplementary-142654.html). – Joe Dec 30 '16 at 12:27
Source code can have any encoding, you can also tell the compiler what encoding to use using the -encoding
flag.
The JVM uses UTF-16, and it's specified in the JLS:
The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding.

- 94,125
- 30
- 188
- 241
-
1This doesn't answer the question; the question is about the encoding of the `.class`-file, not the `.java` file. – Mark Rotteveel Dec 30 '16 at 10:08
-encoding
encoding Set the source file encoding name, such as EUC-JP and UTF-8. If-encoding
is not specified, the platform default converter is used.
Every instance of the Java virtual machine has a default charset, which may or may not be one of the standard charsets. The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.
-
1This doesn't answer the question; the question is about the encoding of the `.class`-file, not the `.java` file. – Mark Rotteveel Dec 30 '16 at 10:08
-
1For the first part of question "Why .class is UTF-8" - when you compile java file to get .class file using javac it uses default encoding, when -encoding option not specified. Second part of question "but runtime .class is UTF-16?" on runtime jvm deals with native library and hence UTF-16 – puvi Dec 30 '16 at 12:39
-
Again, that doesn't answer the question. You keep talking about the source file, the question is about the encoding used in the compiled `class` file, which is always UTF-8, and the Java process using UTF-16 at runtime. – Mark Rotteveel Dec 31 '16 at 08:26