11

Why .class is UTF-8, but runtime .class is UTF-16?

enter image description here

lospejos
  • 1,976
  • 3
  • 19
  • 35
Jerry_W
  • 125
  • 5
  • Why not ? What if the JVM need UTF-16 but the compiled file doesn't ? What is the problem ? – AxelH Dec 30 '16 at 09:28
  • 1
    @AxelH Relax. He just wants to know why would JVM need UTF-16 rather than UTF-8. This is legitimate! – AhmadWabbi Dec 30 '16 at 09:31
  • @AhmadWabbi I didn't say his question was stupid ... I just point some idea... – AxelH Dec 30 '16 at 10:05
  • 1
    Possible duplicate of [Why Java char uses UTF-16?](http://stackoverflow.com/questions/36236364/why-java-char-uses-utf-16) – Joe Dec 30 '16 at 10:21

3 Answers3

8

Why .class is UTF-8

For classes written for a Western audience, which are usually mostly ASCII, this is the most compact encoding.

but runtime .class is UTF-16?

At runtime it's quicker to manipulate strings that use a fixed-width encoding (Why Java char uses UTF-16?), so UCS-2 was chosen. This is complicated by the change from UCS-2 to UTF-16 making this another variable-width encoding.

As noted in the comments of that question, JEP 254 allows for the runtime representation to change to something more space efficient (e.g., Latin-1).

Joe
  • 29,416
  • 12
  • 68
  • 88
  • *At runtime it's quicker to manipulate strings that use a fixed-width encoding* => was the encoding actually fixed-width when it was introduced, or was there already a notion of graphemes requiring multiple code points? – Matthieu M. Dec 30 '16 at 12:17
  • 1
    Sort of; despite UTF-16 supporting variable-width encoding from its first appearance in Unicode 2.0 in 1996 (http://www.unicode.org/faq/utf_bom.html), UTF-16 was effectively fixed-width until Unicode 3.1 in 2001, which was only supported in J2SE 5 (http://www.oracle.com/us/technologies/java/supplementary-142654.html). – Joe Dec 30 '16 at 12:27
-1

Source code can have any encoding, you can also tell the compiler what encoding to use using the -encoding flag.

The JVM uses UTF-16, and it's specified in the JLS:

The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding.

Maroun
  • 94,125
  • 30
  • 188
  • 241
-2

javac encoding:

-encoding encoding Set the source file encoding name, such as EUC-JP and UTF-8. If -encoding is not specified, the platform default converter is used.

JVM encoding:

Every instance of the Java virtual machine has a default charset, which may or may not be one of the standard charsets. The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.

Maroun
  • 94,125
  • 30
  • 188
  • 241
puvi
  • 231
  • 2
  • 10
  • 1
    This doesn't answer the question; the question is about the encoding of the `.class`-file, not the `.java` file. – Mark Rotteveel Dec 30 '16 at 10:08
  • 1
    For the first part of question "Why .class is UTF-8" - when you compile java file to get .class file using javac it uses default encoding, when -encoding option not specified. Second part of question "but runtime .class is UTF-16?" on runtime jvm deals with native library and hence UTF-16 – puvi Dec 30 '16 at 12:39
  • Again, that doesn't answer the question. You keep talking about the source file, the question is about the encoding used in the compiled `class` file, which is always UTF-8, and the Java process using UTF-16 at runtime. – Mark Rotteveel Dec 31 '16 at 08:26