8

I'm trying to write a compiler for C (simpler grammar though).

There is something that I've been stuck on for a while. If I understated correctly, all binary operations are left associative. So if we have we "x+y+z", x+y occurs first and then followed by plus z.

However, doesn't enforcing left associativity causes infinite left recursion?

So far all solutions that I've checked are either left associative or don't have left recursion, but not both. Is it possible to have a grammar that have both of these properties?

Example:

Left Associative:

Expr = Term | Expr + Term
Term = Element | Term ∗ Element
Element = x|y|z|(Expr)

Left Recursion Eliminated:

Expr = Term ExprTail
ExprTail = epsilon | + Term ExprTail

Term = Element TermTail
TermTail = epsilon | * Element TermTail

Element = x|y|z|(Expr)

Any ideas?

Babak
  • 497
  • 1
  • 7
  • 15
  • You can eliminate left recusrion without losing left-associativity, and it is a standard procedure in LL parsers. The exponentiation operator, if you have one, is binary and right-associative. – user207421 Oct 20 '16 at 01:55
  • In C the assignment operators are right-associative, but that's it. – sepp2k Oct 20 '16 at 13:52
  • 2
    @EJP Are you saying there's a way to rewrite the grammar in the question without left-recursion in such a way that the generated parse tree will be left-associative without having to do any post-processing on the tree? If so: how? – sepp2k Oct 20 '16 at 15:46

1 Answers1

5

If an operator is left-associative, then the corresponding production will be left recursive.

If you use an LR parser generator, then there is no problem. The LR algorithm has no problem coping with left recursion (and little problem with any other kind of recursion although it might require a bit more stack space).

You could also use other bottom-up techniques, such as the classic operator-precedence algorithm (viz. shunting yard), but LR parsing is strictly more expressive and the parser generator makes implementation relatively simple.

If you insist on recursive descent parsing, that is possible because you can parse a repeated pattern using a loop (theoretically right recursive) but combine elements left-to-right. In some theoretic sense, this is a tree-rewrite of the AST, but I suspect that lots of programmers have coded that without noticing the tree fix-up.

rici
  • 234,347
  • 28
  • 237
  • 341
  • "you can parse a repeated pattern using a loop" In case of LL parser generators like ANTLR that translates to using repetition operators like `*` in the grammar. That will give you a parse tree with a list in it, which you can then loop over in whichever direction you want. – sepp2k Oct 20 '16 at 13:54
  • Thank you. Yes I am using recursive descent parsing. I am sensing using a loop is the right idea. Could you guide me a bit on where (how) to use the loop exactly? Is it as @sepp2k mentioned, so that for example, as long I have '+', I just add whatever element I have to a list, and then when addition is over, I return that list and recurse on it? Am I understanding this right? – Babak Oct 20 '16 at 17:39
  • 1
    @Babak If you're hand-writing the code, you don't need to go through a list. You can build the tree directly in the loop. Something like: `result = parsePrimaryExpression; if(result) while(parseToken(STAR)) { result = new Multiplication(result, parsePrimaryExpression()); } return result;`. You know, plus error handling and also handling the division operator etc. – sepp2k Oct 20 '16 at 17:51
  • 1
    @babak: basically as sepp2k says. Note that the reduction does nkt, strictly speaking, correspond to the grammar, since if you did a real tailcall, you would end up right associating. So passing the accumulator (`result` in sepp2k's example) into the loop effectively inverts the tree as you construct it. But it's a subtle point probably of little interest in practical coding. Personally, I would use an LR parser generator, but tastes differ. – rici Oct 20 '16 at 18:11
  • @sepp2k Thank you so much. I might be misunderstanding here, but isn't the code you wrote is right associating? So say we are parsing "x * y * z". Result will be x, and after going to the while loop, we are going to get: new Multiplication("x", parsePrimaryExpression()); Which I believe at the end will result in Multiplication("x", Multiplication("y","z")), no? – Babak Oct 20 '16 at 18:27
  • 1
    @Babak No, after the first iteration, result will be `Multiplication(x,y)`, so when it does `result = new Mult(result, parsePrimaryExpression())` a second time in the second iteration, it will end up as `Mult(Mult(x,y), z)`. – sepp2k Oct 20 '16 at 18:35
  • @sepp2k Ofcourse. Sorry, I was being stupid :) Yes, that did the trick! thank you so much both :) – Babak Oct 20 '16 at 19:19
  • A practical example of mixing bottom-up method in recursive descent can be found in [C in 4 Functions Repo](https://github.com/rswier/c4) expr function in c4.c It can take some patience to read because of the way the code is minified and divided into only 4 functions. This expr() function implements precedence climbing for a big subset of C expressions. – cardiff space man Aug 21 '17 at 20:42