Translating a grammar file in a hand written parser

Question

I've been trying to write my own compiler for educational purposes and I'm stuck on an issue. I've taken the recursive descent approach with some previous knowledge on lex and yacc/bison.

So far I'm just trying to handle the parsing aspect without regards to the generation of the AST or code generation.

I'm trying to write the expression parsing for this particular grammar file part

primary_expression
    : IDENTIFIER
    | CONSTANT
    | STRING_LITERAL
    | '(' expression ')'
    ;

postfix_expression
    : primary_expression
    | postfix_expression '[' expression ']'
    | postfix_expression '(' ')'
    | postfix_expression '(' argument_expression_list ')'
    | postfix_expression '.' IDENTIFIER
    | postfix_expression PTR_OP IDENTIFIER
    | postfix_expression INC_OP
    | postfix_expression DEC_OP
    ;

So far I have this code

void Parser::primaryExpression()
{
    if (accept(Token::eIdentifier))
    {

    }
    else if (accept(Token::eIntNumber))
    {

    }
    else if (accept('('))
    {
        expression();
        expect(')');
    }
}
void Parser::postfixExpression()
{

}

I'm having some problems dealing with the recursiveness of the postfix_expression and I don't know how to continue with postfixExpression function.

I'm under the impression that for a recursive descent parser, I should probably arrange my grammar in a different way.

Could anyone point me in the right direction?

See my SO answer on how to write a recursive descent parser: http://stackoverflow.com/questions/2245962/is-there-an-alternative-for-flex-bison-that-is-usable-on-8-bit-embedded-systems/2336769#2336769 — Ira Baxter, May 29 '16 at 15:09

score 2 · Answer 1 · answered May 29 '16 at 13:35

Note that postfix_expression always parses primary_expression first, so the first order of business is primaryExpression().

Then, if the next character is any of the characters that follow the recursive postfix_expression in the remaining seven rules, then you are parsing the postfix_expression. Which gets you another posfix_expression, so you repeat again.

I won't write the C++ code for you, but in pseudocode:

postfixExpression()
{
    primaryExpression();
    while (next character is any of the characters that follow
           postfix_expression in the remaining seven rules)
    {
         parse_the_appropriate_rule();
    }
}

Chris Dodd · Accepted Answer · 2016-05-30T16:51:40.530

Left-recursion is difficult to handle in an LL (recursive descent) parser -- you need to recognize at and change it into a loop rather than a recursive call. In general terms, you want to refactor the left-recursion into

A → α | A β

and then your recusive descent routine becomes

parseA() {
    parseAlpha();
    while (lookaheadMatchesBeta())
        parseBeta();
}

Note that this requires enough lookahead to distinguish between FIRST(β) and FOLLOW(A), in order to find the end of all the trailing things that can match β

This is the same as the process for eliminating left recursion in an LL grammar -- you are effectively replacing the rule above with

A → α A'
A'→ ε | β A'

and then replacing the tail-recursive call in parseAPrime with a loop and inlining it into parseA.

Doing that with your grammar and using the accept/expect technique your code above uses, you get something like:

void Parser::postfixExpression() {
    primaryExpression();
    while (true) {
        if (accept('[')) {
            expression();
            expect(']');
        } else if (accept('(')) {
            if (accept(')')) {
            } else {
                argumentExpressionList();
                expect(')'); }
        } else if (accept('.')) {

⋮

        } else if (accept(Token::DEC_OP)) {
        } else {
            break;
        }
    }
}

Translating a grammar file in a hand written parser

2 Answers2