5

I'm building a toy compiler from a C like language to a stack machine and I'm at the point where I need to figure out what to do with functions and block local variables. Thinking through it abstractly it looks like I have two options at opposite ends of the spectrum: 1) Preprocess and preallocate stack space for each variable, 2) Add special instructions to the VM to walk the stack.

Pre-process and preallocate stack space for each variable

This has the advantage of giving me all the addresses for the variables ahead of time so I don't have to be very clever or add any extra instructions to the VM for walking the stack. The downside is that it is potentially very wasteful because conditional code that never executes but declares a whole bunch of variables will take up a lot of unnecessary space. For example,

a : t1 = value;
if (test) {
  b : t2; c : t3; d : t4; ...;
}

In the above code even if test is always false I will still allocate space for all those variables within the conditional branch.

Add special instructions to the VM to walk the stack

The other approach I could come up with was to generate code for each variable declaration and then add some special VM instructions to figure out the address of those variables at runtime. This solves the problem of wasted stack space but then adds a computational overhead that I can potentially solve by some caching methods.

So what's the correct approach and is there another approach I didn't think of that is better?

Seki
  • 11,135
  • 7
  • 46
  • 70
David K.
  • 6,153
  • 10
  • 47
  • 78

1 Answers1

7

The idea of a stack machine is that it does computations on an operand stack. It doesn't mean everything has to be stored on a stack. This is a common misconception. Typically your local vaiables / block scoped access maps to a register operation.

.NET CLR and Java both have instructions to store and fetch "local" variables as well as other types of variables. I would suggest you follow suit, because you don't want to walk the stack for simple variable access. That is very inefficient. It should be efficient to load / store variables, like a register. Most stack machines still have random access storage.

In the CLR, we also pre-allocate all of our local variables at the beginning of each method. The variables you preallocate may be a mix of explicit high level variables and compiler generated temporaries. But nothing says they have to be on a stack. On the VMs I've worked on we implemented them in a fast access area like an associative array or a vector like structure. I recommend that you use ildasm to disassemble a .NET method and take note of how the local variables are declared and handled.

Example:

 total = apples + oranges

Maps to:

 ldloc 'apples'   # load a local onto stack
 ldloc 'oranges'  # load a local onto stack
 add              # add 2 operands on stack
 stloc 'total'    # store local from stack

In a previous answer I gave an explanation of stack based machines and compared to register machines. I hope there is some useful information in it. https://stackoverflow.com/a/24301283/257090

It is fairly straightforward to implement a prototype of stfld (store field) and ldfld (load field) with a simple Dictionary or HashTable. Later, you can optimize the assemble or runtime to compile symbolic named based references to integer references, especially if there is no need for runtime lookup of variables by name. However, for reflection APIs, you will need to implement additional metadata to cross-reference the numerical or pointer addresses back to their original names.

PS: If I've misunderstood your question, or you want to discuss more, please comment.

Community
  • 1
  • 1
codenheim
  • 20,467
  • 1
  • 59
  • 80
  • I was thinking along similar lines. Can you elaborate on what you mean by pre-allocating all the local variables at the beginning of each method. More specifically how do you deal with dead code? Are the local variables in the dead section pre-allocated as well if you didn't have dead code elimination. – David K. Jul 19 '14 at 04:16
  • Let me answer WRT dead-code first: Don't do anything with dead code, it isn't your concern (as a VM / interpreter). Leave that up to the compiler / programmers. Pre-allocation makes the implementation simple, but also allows the compiler some flexibility to map its high level variables or temporary variables (compiler generated variables for things that don't map to an explicit variable declaration). So yes it means that you can't necessarily use the high-level variable name for your intermediate code (for example, when 2 variables of the same name exists in the source). – codenheim Jul 19 '14 at 04:23
  • Well, in this instance I'm all three: vm implementer, compiler writer, programmer so there are some cross-cutting concerns I'm trying to balance. In any case, thanks. This was very helpful. – David K. Jul 19 '14 at 04:27
  • 1
    When compiling your method, you can decide how many high level variables are stored in locals, and you can also decide how many temporaries to allocate (for keeping values in fast register memory). Allocate them all on some sort of random access stack or structure that gets saved/restored during method calls. You use opcodes like storelocal or loadlocal to retrieve the values from globals, fields, etc. into your locals and to save the values after expressions have completed. Locals are purely symbolic, and a JIT may map them to registers. – codenheim Jul 19 '14 at 04:27
  • @davidk01 - Feel free to email me, my contact info is in my profile, I am happy to answer questions to help you with your VM. One of the best VMs to learn from is the CLR. Use visual studio to study how it compiles high level code. Once you learn the IL, you will get a feel for how the VM works, and it will give you ideas for your own. But that doesn't mean yours has to work the same way. There are many ways to design a VM around the same basic ideas. – codenheim Jul 19 '14 at 04:39
  • Will do. Really appreciate the help. – David K. Jul 20 '14 at 01:53