1

I am writing a grammar for a programming language, but I'm running headfirst into a shift/reduce problem. The problem can be found in the state:

fn_call -> ID . L_PAREN fn_args R_PAREN
assignment -> ID . ASSIGN value
assignment -> ID . ASSIGN container
value -> ID

Before explaining a bit further, I want to clarify:

Is this shift/reduce because the program can't determine if I am calling a function or using the ID as a value (eg. constant or variable)?

Moving on, is it possible to fix this? My language does not currently use line delimiters (such as ';' in C or '\n' in Python). The parser is LALR(1).

What is the most efficient (adding the fewest rules to the grammar) way to decipher between a function call or a variable with line delimiters?

EDIT: Here is the lookahead for that state.

  ! shift/reduce conflict for L_PAREN resolved as shift
    L_PAREN         shift and go to state 60
    ASSIGN          shift and go to state 61
    COMMA           reduce using rule 43 (value -> ID .)
    R_PAREN         reduce using rule 43 (value -> ID .)
    DASH            reduce using rule 43 (value -> ID .)
    R_BRACE         reduce using rule 43 (value -> ID .)
    NONE            reduce using rule 43 (value -> ID .)
    DEFN            reduce using rule 43 (value -> ID .)
    FOR             reduce using rule 43 (value -> ID .)
    INT_T           reduce using rule 43 (value -> ID .)
    DBL_T           reduce using rule 43 (value -> ID .)
    STR_T           reduce using rule 43 (value -> ID .)
    ID              reduce using rule 43 (value -> ID .)
    INT             reduce using rule 43 (value -> ID .)
    DBL             reduce using rule 43 (value -> ID .)
    STR             reduce using rule 43 (value -> ID .)
    COMMENT_LINE    reduce using rule 43 (value -> ID .)
    L_BRACE         reduce using rule 43 (value -> ID .)
    SET             reduce using rule 43 (value -> ID .)

  ! L_PAREN         [ reduce using rule 43 (value -> ID .) ]
sdasdadas
  • 23,917
  • 20
  • 63
  • 148

2 Answers2

2

The following is just guesswork since you haven't shown much of your grammar. I'm assuming that you allow expressions as statements, and not just function calls. In that case, an expression can start with an (, and a statement can end with an ID. Since you have no statement delimiters (I think), then the following is truly ambiguous:

a = b
(c + d)

After reading the b (ID), it is unclear whether to reduce it to value, as part of the assignment, or to leave it as an ID and shift the ( as part of fn_call.

You can't remove ambiguity by adding productions. :)

rici
  • 234,347
  • 28
  • 237
  • 341
  • You're right about the expressions - I apologize but my grammar might be too long / annoying to include. Is there another way to go about splitting statements without delimiters? That seems silly to ask, but I'll ask it anyway. – sdasdadas Jul 16 '13 at 07:11
  • 2
    @sdasdadas: I just wrote an answer to this question: http://stackoverflow.com/questions/17646002/representing-statement-terminating-newlines-in-a-grammar which is vaguely relevant. But the short answer is: design your language so that there are no ambiguities, or forbid call expressions to have a newline between the function and the `(` (for that, see my other answer). – rici Jul 16 '13 at 07:14
1

If this is the set of items that form a "state" of the parser, then you haven't written it down right:

fn_call -> ID . L_PAREN fn_args R_PAREN
assignment -> ID . ASSIGN value
assignment -> ID . ASSIGN container
value -> ID .  *missing lookahead set*

You don't exhibit the rest of your language, so we cannot know what the lookahead set is for the rule

 value -> ID

Under the assumption that you indeed have a shift-reduce conflict in this state, then the lookahead set must contain "ASSIGN" or "L_PAREN". I can't tell you how to fix your problem without knowing more.

Given that your present grammar has these issues, you cannot fix this simply "adding rules" of any kind, whether they involve line delimiters or not, because adding rules will not change what is already in lookahead sets (it may add more tokens to existing sets).

EDIT: One way out of your problem may be to switch parsing technologies. Your problem is the LALR parsers cannot handle the local ambiguity that you seem to have. However, your overall grammar may not have an actual ambiguity if you look further ahead. That depends on your language syntax but you are rolling you own so you can do as you please. I suggest looking into GLR parsing technology, which can handle arbitrary lookahead; check out recent versions of Bison.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • I've included the lookahead rules. The problem, as far as I understand, is that there is a conflict between wanting to reduce to a `value` or shifting to either `rule 60` or `rule 61`. I believe my title is wrong - I will update it. – sdasdadas Jul 16 '13 at 07:07
  • Ah, I remember why I was curious about line delimiters. Because if you had the rule: `value -> ID ';'` then the problem would disappear. – sdasdadas Jul 16 '13 at 07:09
  • ... at which point the LR item value -> ID ; would have no conflict, right. – Ira Baxter Jul 16 '13 at 07:14
  • Thanks (I just caught your edit) - between you and @rici I believe that I understand the possible options. – sdasdadas Jul 16 '13 at 07:18