Given input with repeating BLOCK
s, where each block has repeating BEGIN EVENT
and END EVENT
entries (END EVENT
always follows a BEGIN EVENT
):
[TIMESTAMP] BLOCK
[TIMESTAMP] BEGIN EVENT
[TIMESTAMP] END EVENT
[TIMESTAMP] BEGIN EVENT
[TIMESTAMP] END EVENT
...
[TIMESTAMP] BLOCK
How do you disambiguate this grammar with LR(1)? I'm using LALRPOP, and the minimum example of this is:
Timestamp = "[TIMESTAMP]";
BlockHeader = Timestamp "BLOCK";
Begin = Timestamp "BEGIN" "EVENT";
End = Timestamp "END" "EVENT";
Block = BlockHeader (Begin End)+;
pub Blocks = Block*
Because LR(1) can only look one token ahead though, this grammar is ambigious, as LALRPOP helpfully tells you (partial error):
Local ambiguity detected
The problem arises after having observed the following symbols in the input:
BlockHeader (Begin End)+
At that point, if the next token is a `"[TIMESTAMP]"`, then the parser can proceed in two different ways.
First, the parser could execute the production at
/home/<snip>.lalrpop:51:9: 51:32, which would consume
the top 2 token(s) from the stack and produce a `Block`. This might then yield a parse tree like
BlockHeader (Begin End)+ Block
├─Block────────────────┤ │
├─Block+───────────────┘ │
└─Block+─────────────────────┘
Alternatively, the parser could shift the `"[TIMESTAMP]"` token and later use it to construct a
`Timestamp`. This might then yield a parse tree like
(Begin End)+ "[TIMESTAMP]" "BEGIN" "EVENT" End
│ ├─Timestamp─┘ │ │
│ └─Begin─────────────────────┘ │
└─(Begin End)+───────────────────────────────┘
I see that it is telling me, after parsing a BlockHeader, Begin and End it can't figure out if the next token is another Begin, or the start of another Block. I have not found a way to disambiguate this in LR(1), but I can only assume this is a lack of understanding on my part though, and not an inherit limitation of LR(1) grammars?