Compiling switch statements for a simple VM

Question

So I'm compiling a subset of C to a simple stack VM for learning purposes and I'd like to know how to best compile a switch statement, e.g.

switch (e) {
  case 0: { ... }
  case 1: { ... }
  ...
  case k: { ... }
}

The book I'm going through offers a simple way to compile it with indexed jumps but the simple scheme described in the book only works for contiguous, ascending ranges of case values.

Right now I'm using symbolic labels for the first pass and during the second pass I'm going to resolve the labels to actual jump targets because having labels simplifies the initial compilation to stack instructions quite a bit. My idea right now is to generalize what's in the book to any sequence of case values in any order with the following scheme. Call the case values c1, c2, ..., cn and the corresponding labels j1, j2, ..., jn then generate the following sequence of instructions assuming the value of e is on top of the stack:

dup, loadc c1, eq, jumpnz j1,
dup, loadc c2, eq, jumpnz j2,
...
dup, loadc cn, eq, jumpnz jn,
pop, jump :default
j1: ...
j2: ...
...
jn: ...
default: ...

The notation is hopefully clear but if not: dup = duplicate value on top of stack, loadc c = push a constant c on top of the stack, eq = compare top two values on the stack and push 0 or 1 based on the comparison, jumpnz j = jump to the label j if the top stack value is not 0, label: = something that will be resolved to an actual address during second compilation pass.

So my question is then what are some other ways of compiling switch statements? My way of doing it is much less compact than an indexed jump table for contiguous ranges of case values but works out pretty well when there are large gaps.

Similar to and indexed jump: put the values in a list and loop, followed by an indexed jump. That will handle values with large gaps. — cup, Mar 04 '14 at 09:41
Use a binary tree of branches to minimize the average number of branches. All continguous or near-contiguous sequences should be done with a jump table. — Jim Balter, Mar 04 '14 at 09:43
Normally a jump table should be used, unless the gaps between values are too large - but then you either have to find a cheap function that maps the source set of values to something more compact or translate it into a sequence or a tree of ifs. — SK-logic, Mar 04 '14 at 09:43
@JimBalter: I like that idea. Can you make it an actual answer because I'd like to accept it. — David K., Mar 04 '14 at 11:19

Jim Balter · Accepted Answer · 2014-03-04T12:15:54.530

First sort all the cases. Then identify all (large enough to be worthwhile) continguous or near-contiguous sequences, and treat them as a single unit that is handled with a jump table. Then, instead of your linear sequence of compares and jumps, use a balanced binary tree of branches to minimize the average number of jumps. You do this by comparing against the median of the cases or blocks of cases, recursively.

Compiling switch statements for a simple VM

1 Answers1