Given a recursive descent parser, how should I modify it for Syntax Analysis?

Question

I have just finished coding a recursive descent parser for C Minus that simply prints "ACCEPT" if the input text file can be parsed or "REJECT" if not. I have a function for each rule in the grammar that just returns true or false. Here are some of the checks that must be implemented in the syntax analysis:

Basically, my question is, what is the smart way to do this?

I do not have a symbol table right now at all, but I heard classmates say that I can simply put certain checks into the functions themselves. For example, to not allow a float as an array index, just find the function in my code that has the brackets and put some "check" in there to return an error if a float is between two brackets.

A parser already does syntax analysis, by definition, and that's not what your question is actually about. — user207421, Mar 05 '19 at 14:16

Ira Baxter · Accepted Answer · 2019-03-04T23:24:49.300

2

Your list of checks are not "syntax" analysis in the sense of grammar checks.

They are "syntax" checks if you interpret only completely valid programs as language instances.

Most of the compiler community would call this "semantic" (validity) checking.

Usually such checks are about whether operators are applied to operands according to the language rules (e.g., "operator [] cannot be used except on array types").

To do this you have to know, for each operator (e.g, "[]") what the type of the operand is. To do that, people typically build symbol tables that map identifiers to the type of the identifier, and map source code regions to sets of identifiers that are valid in those regions ("scopes"). With this information you can check that an operator applied to a symbol make sense.

Now a complication occurs: some operators apply to expressions, e.g., other compositions of operators, e.g,: foo(x)[17] which I intend to mean "foo is a function that is called and returns an array. So now you have to associate type information with every expression.

The easiest way is to associate type information with every node in the tree. You'll need to go read about symbol tables, how to build one by doing a tree walk, and then how to walk the tree labelleling each node with a type (identifiers are easy: the type is what the symobl table says the type of the identifier is). Usually you can do a bottom-up tree walk, labelling leaf nodes first, then check the operator nodes above them for validity, and computing the type of operator node after the validaty check passes, and continuing this process to label tree nodes as your analyzer climbs the tree.

[For some languages, a bottom up scan doesn't work; you have iteratively pass information up and down. Ada is rather famous for this].

A formal characterization of this process is called "attribute evaluation". See https://en.wikipedia.org/wiki/Attribute_grammar. You don't need to be this "formal" to implement the ideas, you can do the bottom up case by hand.

That's the smart way. Other approaches might be possible, but they are hard to imagine and likely hard to make work "right".

edited Mar 04 '19 at 23:24

answered Mar 04 '19 at 23:15

Ira Baxter

93,541
22
172
341

Thank you for the answer! How should I build the symbol table? Should I do it inside of my RD parser as I’m parsing or should I have the table made before hand in my lexical analysis? And what attributes should my “node” have? Do they need to have left and right children? – Mar 05 '19 at 02:34
Parse and build the AST first; this keeps this step separate. Bulild the symbol table by walking the AST; don't tangle this into lexing (where you really can't do it very well) or parsing (which has enough to do). Symbol table nodes need to contain the name of the identifer that the node represents, the location in the AST where the identifier is declared (or the source line number if you collected that). Attibutes... depends on what your language allows. Typical is a type attribute, which is a seperate kind of node. I often find it convenient to attach a list of used-at locations ... – Ira Baxter Mar 05 '19 at 13:41
Your type nodes need to contain a scalar type: int, char, float, array, pointer, struct (with members), function (signature). A pointer type needs to have a link to another type node representing the type of the pointed-to entity. The array type needs indications of array dimensionality, and a ponter to a type node containing the type of array elements. A struct node needs to have a set (sequence) of pointers to *symbol table* node that make up the struct members. in order. A function node needs a sequence of pointers to type nodes that represent the return type and the params – Ira Baxter Mar 05 '19 at 13:46
I don’t even know how to build the AST...do I manipulate my RD parser for that? And if it’s not done in Lexing or parsing then when is it done? In between? How do I keep track of what scope its in? Can I just store all variables in a table named after the function they fall under? – Mar 05 '19 at 13:48
Sorry about all the questions, my professor hasn’t really given us any guidance and it is due in 3 weeks – Mar 05 '19 at 14:01
See my answer on how to build recursive descent parsers: https://stackoverflow.com/a/2336769/120163 If you follow the links in that answer, one of them https://stackoverflow.com/a/25106688/120163 will tell you how to integrate building ASTs into the parsing machinery. ... Why hasn't the professor told you how to do this, or pointed you at a compiler book that has sufficient background? Have you asked him this question? – Ira Baxter Mar 05 '19 at 18:55
How do you keep track of scopes? Easy. Before starting a tree walk, push a pointer to a scope into a scope stack. Now walk the tree in-order; each time you encounter a scope-introducer, push a new scope onto the scope stack. Each time you encounter scope-terminator, pop the top scope from the scope stack. As you encounter declarations, put symbol table entries for them in the scope on top of the stack. Each time your encounter a symbol use, look it up in the scope on top of the stack. This trick works for most ("lexcially scoped") languages. If your language is more complicated... – Ira Baxter Mar 05 '19 at 18:59
... you'll have to do something corresponding more complicated but I suspect it is not. Try this first. – Ira Baxter Mar 05 '19 at 18:59

Given a recursive descent parser, how should I modify it for Syntax Analysis?

1 Answers1