Generate output along with recursive descent parser

Question

Writing simple parsers are trivial and I have implemented several over the years. In college, we also had to write one. But we never had to generate meaningful output using this method; we never learned how to create a back end.

If I have a working recursive descent parser for a simplified Pascal and I wanted to translate the code to C++, how would I go about doing this? I dont see a need for an intermediate step such as generating an abstract syntax tree.

So how do I go about outputting the compiled or translated code? The only useful example I have found of this is in Jack Crenshaw's tutorial but it focuses more on the front end, as does most other resources. The relationship between my parser code and my grammar is very obvious. What about the relationship between the parsers methods and the outputting? My parser method for a function declaration can relate to exactly one EmitLn(C++ code here) call. But what about parser methods that aren't that easy, such as expressions. Expressions are broken down to possibly many more calls, so that alludes to the need of a Emit() function that allows me to break down the output code for an expression piece by piece. Is there any boiler plate code for outputting code, such as the EmitLn function in Jack Crenshaw's Lets Build a Compiler? This also shows me I need to maintain a basic symbol table, another thing that is often omitted in most examples.

Am I right? What else should I know? Any tips/suggestions, or resources? My big question is, there are so many tutorials out there for compiler front ends, but how about some explanation on the back end? I can parse languages, now I want to evolve that into being able to translate and compile them into other languages.

score 3 · Accepted Answer · edited May 23 '17 at 11:48

There are trivial compilers using "on the fly" code generators that spit code as they parse, as you seem to be trying to do. The good news is that they look easy.

The problem is that compilation is fundamentally about spreading information from one part of the code to another, to enable good code generation. For that, compiler people long ago decided to separate parsing from code generation. The parser parses the code and builds another intermediate data structure (often an abstract syntax tree or triples) that represents the program.

Often, very deep analysis is then lavished on the intermediate structure to determine how to generate good code, followed by appropriate good code generation. The techniques for doing this well are complex, but well motivated by the need to produce good (and correct) code.

Try checking out the standard compiler resources. Learning to write a compiler

It is well worth your time to read some of these references in detail instead of hacking. If you insist on just one, Aho and Ullman is the classic book.

If you want to see the how on the fly code generation can be made to work reasonably well, check out metacompilers.

You didnt really answer my question or provide me with any specific thoughts on how to generate code while in the midst of parsing, but you did provide good information and the metacompiler suggestion is extremely interesting, thank you. — Austin Henley, Oct 20 '11 at 19:34
@Azh: Right, I did not because I wanted to emphasize the other point. It is easy to "generate (bad) code" by simply inserting print statements in the recognition part of the recursive descent parser code. The questions are, how to do this, and what to print? The metacompiler (MetaII, referenced) carries this to an extreme, by letting you specify what to print interleaved with the grammar; both are compiled to a recursive descent compiler that spits code on the fly. To generate good code, though, you need *context*, e.g., memory of other parts of the program, and that's where this fails. — Ira Baxter, Oct 20 '11 at 20:52

score 1 · Answer 2 · answered Aug 30 '12 at 18:47

Short answer After matching a rule in your grammar, call a function that does something based on the input.

Long answer: From Language Implementation Patterns by Terence Parr discussing Syntax-Directed Interpreters:

a syntax-directed interpreter doesn't create an AST or translate the source code to bytecodes... The interpreter directly feeds off of syntax to execute statements.

This is precisely the idea that I was going for in my original question. Continuing in the book he describes the technique, how it works, what it consists of, and the limitations. It works for DSL and small languages rather than general purpose languages.

He states that constructs such as IFs and loops are awkward to implement. While this is likely true, I had no trouble implementing such features in a language of mine, Peay BASIC. Languages that are sequences of instructions or simple statements are ideal for this pattern; he gives text-processing languages as an example.

The grammar is extended to consist of "match this, call that" pairs. So the neccessary rules in your grammar should have an action associated with it. Example of an assignment statement:

assignment : ID '=' expr {interp.assign($ID, $expr.value);};
expr returns [Object value] : ... ;

Parr's description is clearly about syntax-directed *interpreters*, but you asked specifically about *compiled* or *translated* code. As one distinguishing difference, a syntax directed *compiler* can easily emit a "goto" statement to be executed later, and Parr (rightly says) an *interpreter* can't handle these as well... because it has no knowledge of where the target is. I think you've misinterpreted what Parr said. I don't know how you implemented your Basic, but if you can execute gotos, you aren't generating code on the fly which is what syntax directed translators usually do. — Ira Baxter, Aug 31 '12 at 03:46
Why do I even question you? Oh yes, because I learn so much every time... I will leave my answer but accept yours once again. I had a feeling of deja vu when I read this chapter the other day and couldn't help but think it applied to this question. — Austin Henley, Aug 31 '12 at 03:51

Generate output along with recursive descent parser

2 Answers2