6

What are the differences between the byte code binary executables such as Java class files, Parrot bytecode files or CLR files and machine code executables such as ELF, Mach-O and PE.

what are the distinctive differences between the two?

such as the .text area in the ELF structure is equal to what part of the class file?

or they all have headers but the ELF and PE headers contain Architecture but the Class file does not

Java Class File Java Class file

Elf file ELF File

PE File PE File

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
zeitue
  • 1,674
  • 2
  • 20
  • 45

3 Answers3

13

Byte code is, as imulsion noted, an intermediate step, right before compilation into machine code. Because the last step is left to load time (and often runtime, as is the case with Just-In-Time (JIT) compilation, byte code is architecture independent: The runtime (CLR for .net or JVM for Java) is responsible for mapping the byte code opcodes to their underlying machine code representation.

By comparison, native code (Windows: PE, PE32+, OS X/iOS: Mach-O, Linux/Android/etc: ELF) is compiled code, suited for a particular architecture (Android/iOS: ARM, most else: Intel 32-bit (i386) or 64-bit). These are all very similar, but still require sections (or, in Mach-O parlance "Load Commands") to set up the memory structure of the executable as it becomes a process (Old DOS supported the ".com" format which was a raw memory image). In all the above, you can say , roughly, the following:

  • Sections with a "." are created by the compiler, and are "default" or expected to have default behavior
    • The executable has the main code section, usually called "text" or ".text". This is native code, which can run on the specific architecture
    • Strings are stored in a separate section. These are used for hard-coded output (what you print out) as well as symbol names.
    • Symbols - which are what the linker uses to put together the executable with its libraries (Windows: DLLs, Linux/Android: Shared Objects, OS X/iOS: .dylibs or frameworks) are stored in a separate section. Usually there is also a "PLT" (Procedure Linkage Table) which enables the compiler to simply put in stubs to the functions you call (printf, open, etc), that the linker can connect when the executable loads.
    • Import table (in Windows parlance.. In ELF this is a DYNAMIC section, in OS X this is a LC_LOAD_LIBRARY command) is used to declare additional libraries. If those aren't found when the executable is loaded, the load fails, and you can't run it.
    • Export table (for libraries/dylibs/etc) are the symbols which the library (or in Windows, even an .exe) can export so as to have others link with.
    • Constants are usually in what you see as the ".rodata".

Hope this helps. Really, your question was vague..

TG

Technologeeks
  • 1,098
  • 8
  • 7
6

Byte code is a 'halfway' step. So the Java compiler (javac) will turn the source code into byte code. Machine code is the next step, where the computer takes the byte code, turns it into machine code (which can be read by the computer) and then executes your program by reading the machine code. Computers cannot read source code directly, likewise compilers cannot translate immediately into machine code. You need a halfway step to make programs work.

imulsion
  • 8,820
  • 20
  • 54
  • 84
  • That's not really what he asked about. – Joachim Isaksson Aug 30 '12 at 07:02
  • this information is helpful but does not tell me the structure differences and similarities but this information makes me think that the byte code is simply the function calls .text and the variables defined .data – zeitue Aug 30 '12 at 07:27
  • Actually there are computer architectures which can execute Java bytecode natively. – Antimony Aug 31 '12 at 04:10
1

Note that ELF binaries don't necessarily need to be machine/arch specific per se.

The interesting piece is the "interpreter" header field: it holds a path name to a loader program that's executed instead of the actual binary. This one then is responsible for loading the actual program, loading and linking libraries, etc. This is the way how eg. ld.so comes in.

Theoretically one could create an ELF binary that holds java bytecode (or a complete jar). This just needs some appropriate "interpreter" program which starts up a JVM and loads the code from the binary into it.

Not sure whether this actually has been done before, but certainly possible.

The same can be done w/ quite any non-native code.

It also could serve for direct multiarch support via some VM like qemu: Let the target platform (libc+linker scripts) put the arch name into the interpreter program name (eg. /lib/ld.so.x86_64, /lib/ld.so.armhf, ...). Then, on a particular arch (eg. x86_64), the one with native arch name will point to the original ld.so, while the others point to some special one that calls up something like qemu-system-XXX.