3

Hi I'm writing rpn calculator in C and I'd to compile rpn expression to byte-code. But I'm not sure what would be the data structure to represent them??? My stack data structure so far is

 struct stack_t{
        int type;
        union {  
              double val;
              char *str;
              /* probably some more */
             }u;
    };
Nyan
  • 2,360
  • 3
  • 25
  • 38
  • 1
    Canonical compiler resources question here: http://stackoverflow.com/questions/1669/learning-to-write-a-compiler . Most of the resources listed therein will tell you how to build a general AST, which will certainly do the job, though there might be a simpler choice for your limited domain. – dmckee --- ex-moderator kitten Feb 21 '10 at 23:05

3 Answers3

1

It actually depends a lot on the feature set you are going to support. For example, if your calculator supports integers (e.g., integer division), you probably would need int in your union.

Vlad
  • 35,022
  • 6
  • 77
  • 199
0

The advantage of RPN bytecode is that it doesn't need to be stored in a special structure. An array/vector/list will do. You need a stack to interpret it, but to store the program you should only convert each token to a bytecode instruction.

As for what types to put in your union, it really depends on the semantics of your calculator. Why do you want to be able to store strings? Do you plan on using strong typing?

Look into Forth, the canonical simple RPN language.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • my guess is that strings may be needed if we'd like to represent variables – Vlad Feb 22 '10 at 00:03
  • calculator + variables = interpreter – Potatoswatter Feb 22 '10 at 00:59
  • The calculator so far supports real number and string data type and variables also. Now I'd to add function definition and need to compile the input rather than interpreting on the fly. – Nyan Feb 22 '10 at 03:40
  • @Nyan: Note that Forth is typically compiled directly to machine code, even on the fly. It's also extensible to allow alternate RPN or even non-RPN syntax. See how fast an implementation of you language in Forth is before you put too much effort into your own. – Potatoswatter Feb 22 '10 at 03:50
0

I've gone through several re-writes of a postscript interpreter, and while my object struct started out exactly like yours, I eventually found this layout to be "better" for reasons that I may not be able to articulate. I found this idea in the X11 Event structure.

typedef struct {
    word tag;
    word pad;
    dword padw;
} mark_;

typedef struct {
    word tag;
    word pad;
    integer val;
} int_;

typedef struct {
    word tag;
    word pad;
    real val;
} real_;

typedef struct {
    word tag;
    word sz;
    word ent;
    word off;
} comp_;

typedef union {
    word tag;

    mark_ mark_;
    int_ int_;
    real_ real_;
    comp_ comp_;
} object;

It may seem redundant to have the tag both outside and inside each structure, but it keeps everything in nice little boxes. It allows you to see the memory layout more clearly.

Edit: I call it "tag" instead of "type" because it contains both type and flags.


Edit: One of the reasons I consider this better is it eliminates the .u term from the access expressions. Instead of stk.u.val, you have stk.real_.val or stk.int_.val. The union part is shifted upstream, so the data portions can have internal structure without additional nesting. And the tag is still accessible at the top-level to determine which sub-type is being used.

luser droog
  • 18,988
  • 3
  • 53
  • 105