3

My question is the same as the title. I just want to know if there are any other translation techniques to get the intermediate code that doesn't rely on embedding actions into the parser (that is, the parser will strictly create the abstract syntax tree, it won't generate any code). Thanks for any answers.

Sean Kelleher
  • 1,952
  • 1
  • 23
  • 34
  • 1
    I suppose you could build a neural network, and train it, without any embedded knowledge - do you have a specific need to avoid embedding actions in the parser? – Damien_The_Unbeliever Jul 10 '11 at 18:13
  • No, I just would like to know if there are any other options and I find it difficult to find information on the matter. – Sean Kelleher Jul 10 '11 at 18:27

2 Answers2

7

If you have a parser, then the parser has to do something more than just "recognize" the input stream as a valid instance of the language. If you want the compiler to produce anything, some kind of actions must be attached to activities in which langauge fragments are matched. In some sense, it can't be anything but "syntax directed"; at the parsing stage, all you have is syntax.

Fundamentally, the parsing actions have to build a representation of the program that is "more compilable". I only know of a few basic methods:

  • Generate a set of virtual machine instructions
  • Build an abstract syntax tree to be passed to the rest of the compiler
  • Build some kind of control/data flow representation of the code (e.g, "triples")

These are all in the abstract really pretty much the same, in that the parser-generated representation contains linked structures whose elements have some direct semantic intepretation. These bits of structure with implied semantics are what the rest of the compiler keys off.

Now, the text is a representation of the program. You can avoid the "parsing" process (sort of) completely if you decide to implement you compiler as a Post system, a set of rewriting rules over strings. These are of the form of "if you see this string, replace it by this other string". Post system are provabaly Turing capable, so technically you can transform your source code into a string representing your target program with a sufficiently clever set of string rewriting rules. Nobody I know builds real compilers this way; I'm sure if you dig hard enough, you can find an obscure technical paper that does this.

One could reasonably argue that matching strings, as required by the Post system, is a kind of parsing (e.g., recognizing interesting structure), putting you back at the original question.

It is interesting to note that program transformation systems (I build one of these commercially) typically use a compiler front end to build ASTs, and then apply AST tree-to-tree rewrites to achieve thier purpose. The reason they are constructed this way is because they are really Post systems in diguise; any AST for your program can be trivally converted into a string (e.g., S-expressions), and the tree-rewrites can be converted into equivalent string rewrites. So a tree-rewriting system is just a Post system, but that makes it very powerful. Of course, one can combine this capability with other more traditional compiler methods, which is what we do with our product. That makes it more convenient; you don't have do everything as Post system.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Thanks very much for the detailed answer. Does this mean that you can use the AST instead of the IR for generating the code? That's what I had in mind originally, but I wasn't sure if it was possible... – Sean Kelleher Jul 10 '11 at 18:39
  • 1
    One can generate simple code directly from the AST by simply walking over it and generating code for each operator/operand as encountered. The resulting code will be pretty much what you would get if you generated directly from the parse. More complicated schemes walk over the AST and generate "triples" and then pass that to the rest of the compiler. If you want to really know how compilers generate code, you really ought to consult a standard text; see this SO answer: http://stackoverflow.com/questions/1669/learning-to-write-a-compiler – Ira Baxter Jul 10 '11 at 20:18
0

What you describe is in fact the standard compiler design. It is possible to write one-pass compilers, and indeed I have done so, that produce object code during parsing, but the norm is for the parser to produce an AST and for a subsequent traversal of the AST to produce the either the output or in many cases a further intermediate form such as triples, RTL, etc.

user207421
  • 305,947
  • 44
  • 307
  • 483