37

I was briefly reading about Maxine which is an open source JVM implementation that written in Java. This sounds circular to me. If java requires a virtual machine to run in, how can the virtual machine itself be written in Java (won't the VM code require a VM in which to run, and so on?).

Edit: Ok, so I see I overlooked the fact that Java doesn't have to run in a VM. How then does one explain how a LISP compiler can be written in LISP? Or should this be a new question altogether?

ewernli
  • 38,045
  • 5
  • 92
  • 123
kji
  • 6,681
  • 3
  • 18
  • 7
  • 1
    Wasn't the first C++ compiler by Bjarne Stroustroup written in C++ (back when it was still called "C With Classes")? Which I would consider even more impressive, since C++ is not an interpreted language but requires a compiler! – mxk Feb 17 '10 at 08:47
  • 1
    Which is precisely what I dont understand :) – kji Feb 17 '10 at 08:49
  • 5
    The New Dragon Book, First Edition (stay away from the error-ridden second edition) explains compiler bootstrapping. – Ignacio Vazquez-Abrams Feb 17 '10 at 08:55
  • At the extreme, you could write an assembler and linker in Java(There's nothing magic about an assembler). You could write a parser for Java source code, in Java. You could write a compiler based on your new parser that generates assembly, which you assemble and link to a native executable using your assembler and linker. – nos Aug 08 '12 at 16:39

10 Answers10

15

Your assumption that Java requires a virtual machine is incorrect to begin with. Check out the project GCJ: The GNU Compiler for the Java Programming Language.

ComFreek
  • 29,044
  • 18
  • 104
  • 156
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 1
    The java *language* may not require a VM, but the term "java" covers more than just the language, it also includes the VM. – skaffman Feb 17 '10 at 08:36
  • And Maxine at least seems to require a JDK1.6 JVM to run on top of. – Thilo Feb 17 '10 at 08:39
  • @kji: I think I was mistaken. It seems that Maxine does compile to a native executable, so no need for the JDK JVM after building Maxine initally (it still needs to class library, I believe). – Thilo Feb 17 '10 at 08:49
13

You are asking about the chicken and the egg.

Read: http://en.wikipedia.org/wiki/Bootstrapping_%28compilers%29

Yuval Adam
  • 161,610
  • 92
  • 305
  • 395
7

The JVM that you need to bootstrap a JVM written in Java probably does not need a lot of features (such as garbage collection and JIT), could be very simple. All the more advanced features could then be implemented in Java (which seems to be exactly the point of Maxine, to experiment with new ideas in JVM technology).

Also, Maxine does contain C code, which I guess makes up a minimal runtime environment that is used to get the rest of Maxine going. I take it that the interesting bits (JIT compiler, garbage collection) are then completely implemented in Java.

Thilo
  • 257,207
  • 101
  • 511
  • 656
  • Nope. The C code in Maxine is more used like a data definition language, it doesn't actually implement anything interesting. The operating system expects certain structures to be laid out in a certain specific way in memory, and the C compiler knows how to do this, so Maxine uses C to get these structures laid out correctly. But there are only few places that use C: a minimal launcher, which loads the VM image into memory, writes the load address into a specific location inside the image and then jumps to another specific location. The debugger, because Maxine uses the OS's C debug facilities. – Jörg W Mittag Feb 17 '10 at 10:03
  • The low-level part of JNI, because, well tho whole point of JNI is to integrate with C. And the low-level part of threading, because Maxine uses native threads which are wildly different from system to system. – Jörg W Mittag Feb 17 '10 at 10:05
  • @Jörg W Mittag: so, does Maxine still run the normal JVM? the VM image that is being loaded is a native binary, right? How is that being created? – Thilo Feb 17 '10 at 10:24
  • 1
    All modern JVMs have a JIT compiler, which compiles JVM bytecode to native machinecode. Normally, only the "hot" parts of the code are being compiled (the rest is being interpreted) and normally, the code is held in memory and discarded when the program exits. But there is no reason why you couldn't JIT compile *all* the code and there is no reason why you couldn't write the code to disk, which is exactly what Maxine does. Oracle's JRockijt JVM and Google's V8 JavaScript compiler for example also keep compiled native code on disk, although in their case it's just for caching reasons. – Jörg W Mittag Feb 17 '10 at 11:19
3

See bootstrapping.

yawn
  • 8,014
  • 7
  • 29
  • 34
2

Java code can be compiled directly to machine code so that a virtual machine is not needed.

macleojw
  • 4,113
  • 10
  • 43
  • 63
2

I had a look at Maxine last week and was wondering the same :)

From the Maxine documentation:

1 Building the boot image

Now let's build a [boot image]. In this step, Maxine runs on a host JVM to configure a prototype, then compiles its own code and data to create an executable program for the target platform.

2 Running Maxine

Now that Maxine has compiled itself, we can run it as a standard Java VM. The max vm command handles the details of class and library paths and provides an interface similar to the standard java launcher command.

ewernli
  • 38,045
  • 5
  • 92
  • 123
0

I know this post is old but I thought I might add a little to the discussion as they are points that have been missed. So future readers may find this helpful.

I wonder if everyone is missing the point here. You can write most any kind of compiler, interpreter, or virtual machine in almost any language. When using C to write a C compiler a C compiler is needed to compile the new compiler. However, the output is native code that runs on the designated platform. Just because the JVM is written in the language that runs on the JVM doesn't mean the output must result in code that runs on the JVM. For instance you can write C, Basic, Pascal Compilers or even assemblers in Java. In this case you will need the JVM to create the compiler or assembler but once created you may no longer need the JVM if the initial code resulted in native code. Another approach is to write a translator that takes an input language and converts it to a native machine language so that you write your program in language A which compiles into language B which which is then compiled into machine code. In the micro controller world you see this a lot. Someone wants to write programs in Basic or Java so they write the Basic/Java compiler to produce C code for an existing C compiler. Then the resultant C code is compiled into machine language providing the native Basic/Java compiler. This approach is usually easier than writing the Basic/Java compiler directly in machine code.

Many years ago I wrote BasicA and GWBasic programs that produced assembly code to 6800 and Z80 micros. My point is that the output need not be of the same ilk as the input or target. I.E. Just because you're writing a JVM in Java doesn't mean the final result must be ran under a Java JVM.

user693336
  • 325
  • 5
  • 12
0

Here is a good paper on bootstraping a self-hosted VM. It's not Java, but javascript, but the principles are the same.

Bootstrapping a self-hosted research virtual machine for JavaScript: an experience report

Note that while bootstraping a self-host compiler and bootstraping a self-hosted VM are somewhat similar, I believe they do not raise the exact same challenges.

ewernli
  • 38,045
  • 5
  • 92
  • 123
0

You can have a look at the well-established method of bootstrapping compilers. I think it started in the 70s...

malaverdiere
  • 1,527
  • 4
  • 19
  • 36
0

It is kinda 'whooaoaa man, how can that work???' - but I think you are describing the phenomenon known as 'self-hosting':

Languages (or toolchains/platforms) don't start out as self-hosting - they start off life having been built on an existing platform: at a certain point they become functional enough to allow programs to be written which understand the syntax which it itself happens to be written in.

There is a great example in the classic AWK book, which introduces an AWK program which can parse (a cut-down version as it happens) other AWK programs: see link below.

There is another example in the book "Beautiful Code" which has a Javascript program which can parse Javascript.

I think the thing to remember on this - if you have (say) a JVM written in Java which can therefore run Java Byte code: the JVM which runs the Java JVM itself has to be hosted natively (perhaps this JVM was written in 'C' and then compiled to machine code) : this is true in any case of a self-hosting program eventually - somewhere along the line.

So the mystery is removed - because at some point, there is a native machine-code program running below everything.

It kinda of equivalent of being able to describe the English (etc) language using the English language itself....maybe...

http://www.amazon.co.uk/AWK-Programming-Language-Alfred-Aho/dp/020107981X/ref=sr_1_fkmr0_3?ie=UTF8&qid=1266397076&sr=8-3-fkmr0

http://www.amazon.co.uk/gp/search/ref=a9_sc_1?rh=i%3Astripbooks%2Ck%3Abeautiful+code&keywords=beautiful+code&ie=UTF8&qid=1266397435

http://en.wikipedia.org/wiki/Self-hosting

monojohnny
  • 5,894
  • 16
  • 59
  • 83