I am always curious as to why the JVM and CLR have a stack-based architecture? Why don't they use a register-based approach? What benefits does it have over the register-based approach?
-
possible duplicate of [Advanatges of stack-based architecture of the JVM's instruction](http://stackoverflow.com/questions/10515154/advanatges-of-stack-based-architecture-of-the-jvms-instruction) – Greg Hewgill Oct 28 '12 at 21:50
2 Answers
I used to ponder the differences between register and stack machines and compare instruction sequences, and run benchmarks...
Then I spent a couple of years implementing both types of machines while working on the Parrot VM, which was a register machine. We started, naively, with a fixed register set, in combination with data and register stacks, but eventually concluded that it was an artificial limitation, so we changed to an infinite register set and an allocator. At some point, the Parrot fast core (GCC computed goto) outperformed Mono and JVM interpreter cores (non-JIT), but the difference came down to the JIT. Parrot's JIT never matched the quality of the others. It is the quality of the JITter that makes the eventual machine, and that is generally what people care about. If all VMs played by the same rules (ie. they had a constraint to run in interpreted mode with no JIT), then my evidence shows a register machine has the performance edge on an equivalent stack machine. Larger instructions, but fewer of them == higher throughput (IPC), and better cache locality of reference. The Dalvik JVM actually supports my findings, Dalvik had no JIT for a couple of years, and competed with its interpreter core.
Very few mainstream VMs run in interpretation mode exclusively (AFAIK), they JIT compile, and thats what we benchmark. The point of the interpreter core is to establish a presence on the platform, do bytecode verification, and provide a failsafe execution core in absence of the JIT. This isn't a rule, of course; there are billions of devices running ARM accelerated JVM without JIT, but in the absence of memory or CPU constraints, this applies.
I worked and worked at tweaking the core, testing and tuning, only to find that in the end we really wanted a fast JIT. I arrived at the conclusion that if you are going to eventually JIT, it doesn't matter much whether you implement a stack or register machine to start, do what you like; but you will get "to market" faster with a stack machine. Doing a lot of pseudo-register-machine virtual optimizations for bytecode interpretation by a virtual machine core is partially a wasted effort, because it isn't real native optimization. The soft-core doesn't do branch prediction, register renaming, instruction reordering, parallel execution or prefetch like a real processor. My feeling is that once we have a high quality JIT to native binary, we arrive at the same destination.
For those reasons, I technically favor a stack based machine for:
- Simplicity - Much less code to maintain = less bugs
- Time to implement
But visually, and emotionally I favor a register machine for:
- Visual-Conceptual models more closely match the machine, and my brain
- Flexibility - Compilers can evaluate their expression trees in different orders using SSA.
Note I didn't say compilers could more "easily" generate code. That seems to be what people who have worked mostly with stack machines like to argue. I don't believe that and didn't find that to be true. I saw many hobby compilers written in a short time on both Parrot and the CLR, though I would admit the ones on the CLR are of higher quality, but that is mainly one of ecosystem and quality of available tools. I wrote compilers on both platforms myself, and found there are tradeoffs, but not enough to lose sleep over.
This is an educated guess, because my real-world experience does not include writing a full JITter so I don't have first-hand experience comparing the pros and cons of JITting various forms of opcodes, but my opinion is, if you plan to include a JIT, then creating an extremely sophisticated virtual machine opcode core amounts to premature optimization. Your time is better spent elsewhere.

- 20,467
- 1
- 59
- 80
It is usually not appropriate to just link out to an article but this time I'll make an exception: This article by Eric Lippert answers just this question.

- 168,620
- 35
- 240
- 369