Why is Bytecode not human-readable?

Question

I'm confused about a certain topic:

When you compile Java or Python, you get bytecode which will run on the respective VMs. In a previous question I had asked why, when you open a .pyc or .class file in a text editor, it appears as gibberish and not like readable bytecode (LOAD, STORE operations etc).

Now the answer I got at the time based around the argument of "That's like saying if you opened an .exe file and expected to see x86 assembly" and they made the analogy that bytecode that I've seen is the "assembly" version of the real bytecode which is not readable.

This would be okay and make sense if not for one thing. You can't compare an exe file to a bytecode file. An exe file is ALREADY compiled to machine code. A bytecode file is NOT. A bytecode file is fed to a VM which then interprets it (usually with JIT).

That means that whoever wrote the JVM for instance, (which is just a piece of software itself), would need to write a bytecode-interpreter. And I really doubt they wrote an interpreter to handle the following:

Java .class file:

I could be wrong and maybe they DID write an interpreter to handle this form of bytecode for some odd reason, but it doesn't seem likely. However, if the JVM handles the "assembly" version of the bytecode, then that would mean the cycle is

.java -> .class (unreadable) -> .class (readable right as it enters the JVM) There's almost a meaningless step in between.

I'm just really confused at this point.

I think this is a good question, being human readable is not one of purposes of an intermediate code. memory efficiency and easy readability (by JVM) is more important. — roozbeh sharifnasab, Jul 04 '20 at 18:09
Which why are you asking - "what mechanism causes byte code to not be human readable" or "what was the motivation behind not making byte code human readable"? — Joni, Jul 04 '20 at 18:10
It is perfect human-readable, but not thru a text editor. All (?) IDE can open a .class file and show the byte-code https://stackoverflow.com/questions/202586/best-free-java-class-viewer https://en.m.wikipedia.org/wiki/Java_bytecode_instruction_listings — PeterMmm, Jul 04 '20 at 18:13
That's just because bytecode is a serialized version of what is supposed to be loaded into memory without additional translation step (that would be required to interpret human-readable statements). — yegodm, Jul 04 '20 at 18:14
The JVM *is* a bytecode interpreter. Anything that reads a program and executes it (and is not the hardware) is called an interpreter. And writing an interpreter for machine-readable byte code is much simpler than writing an interpreter for something human-readable. — Ole V.V., Jul 04 '20 at 18:16
Remember that Java's goal is to be platform-independent. That's why Java requires a virtual machine. To accomplish this the program needs to be stored in a format understandable to any JVM implementation. They designed a format specially for this—Java bytecode. This bytecode is then read by the JVM and converted into _platform-specific machine code_. Using Java source code would be less efficient; parsing text is harder and more resource intensive and source code typically takes up more space. — Slaw, Jul 04 '20 at 19:02
Another point to bring up is that Java is made up of two specifications: The _Java Language Specification_ and the _Java Virtual Machine Specification_. The former specification defines the Java language (source code) and it's compiled into bytecode which is defined by the latter specification. — Slaw, Jul 04 '20 at 19:04
I can’t tell if you’re saying “there should be no compilation step,” suggesting that the Java runtime should accept source files, or if you’re saying that “there should be only one compilation step,” suggesting that the compiler should immediately produce native machine code. The former is not only inefficient, but would throw away the extremely valuable step of detecting a lot of code errors before distributing the program. The latter would remove the ability to write platform independent code, as others have pointed out. — VGR, Jul 05 '20 at 00:23
@VGR: I think the OP is expecting bytecode that is not the Java language, but is nevertheless human-readable, and everything else to be the same. It's not impossible, it's just less efficient. — Louis Wasserman, Jul 05 '20 at 20:50
It is also worth noting that Java came out in 1995. At the time, a typical PC had about 4 megabytes of RAM. And, Java’s primary intended use at the time was downloadable applets. Compiled .class files had to be compact, both for storage and for download at a time when most people were using modems. — VGR, Jul 05 '20 at 21:14

score 13 · Answer 1 · answered Jul 04 '20 at 18:07

13

They did write an interpreter for this form of bytecode. They read it as bytes, of course, not ASCII characters, which makes it more usable. But, for example, each instruction code takes only one byte, not e.g. five to write store.

The goal was to have something compact in memory usage, but not actually compiled to machine code that would be specific to only one device. Java bytecode is more or less its own form of machine code.

If you would like to read it, however, use the javap command to decompile it to a more readable form.

answered Jul 04 '20 at 18:07

Louis Wasserman

191,574
25
345
413

So just to be completely clear, they wrote an interpreter for the unreadable bytecode? I guess I can see that. It's just difficult to grasp as I'm studying compiler design, and I'd imagine that form is difficult to parse. – Lauren 835 Jul 04 '20 at 18:09
1

I came here to say pretty much the same thing. Bytecode is binary because that is efficient both to transmit places and to execute. There is little enough reason to look at it, ever -- no one ever expected to be writing programs in the 'assembler language' equivalent to bytecode. – arcy Jul 04 '20 at 18:09
4

It's _not_ hard to read if you read it as bytes, instead of as letters. But, for example, the bytecode for `istore_0` is the byte `3b`, which if you try to open it in Notepad looks like a `;`. – Louis Wasserman Jul 04 '20 at 18:09
1

@Lauren835 "I'd imagine that form is difficult to parse" -- they don't *parse* it, they *execute* it. Imagine a big switch statement with bytecode values as cases -- for each case, a subprogram that does what that bytecode indicates. They saved themselves the necessity for parsing it when they made it binary. – arcy Jul 04 '20 at 18:10
@arcy: Eh, I'm not sure how much I buy that distinction? It's a parsing-and-executing-and-also-compiling-because-JIT step, all pretty mushed together. – Louis Wasserman Jul 04 '20 at 18:11
1

I think of "parsing", especially for computer languages, as the process of interpreting text to arrive at some form of computer execution. The bytecode, as pointed out, is not text. No parsing is necessary to execute it. – arcy Jul 04 '20 at 18:17
https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html you definitely need to parse this – Joni Jul 04 '20 at 19:11
6

I don’t think that it is necessary to emphasize the compactness. The important point is, the M in JVM stands for *machine*, therefore, it’s straight-forward to design the bytecode to be *machine readable*, instead of *human readable*. It’s not clear why the OP thinks, a human readable bytecode helped a software like the JVM to process it. Especially someone who is “studying compiler design” should understand that processing human readable source code is everything else but trivial. – Holger Jul 06 '20 at 09:35

score 1 · Answer 2 · answered Jul 04 '20 at 18:56

Bytecode is the "machine code" for a virtual machine. As such, it has much the same goals and restrictions as "real" machine code - compact, efficient decoding, etc.

The fact that bytecode is executed by a virtual machine rather than by a "real" machines is not particularly relevant.

Why is Bytecode not human-readable?

2 Answers2