I need to parse a simple language that I didn't design, so I can't change the language. I need the results in C#, so I've been using TinyPG because it's so easy to use, and doesn't require external libraries to run the parser.
Things had been going pretty well, until I ran into this construct in the language. (This is a simplified version, but it does show the problem):
EOF -> @"^\s*$";
[Skip] WHITESPACE -> @"\s+";
LIST -> "LIST";
END -> "END";
IDENTIFIER -> @"[a-zA-Z_][a-zA-Z0-9_]*";
Expr -> LIST IDENTIFIER+ END;
Start -> (Expr)+ EOF;
The resulting parser cannot parse this:
LIST foo BAR Baz END
because it greedily lexes END as an IDENTIFIER
, instead of properly as the END
keyword.
So, Here are my questions:
Is this grammar ambiguous or wrong for LL(1) parsing? Or is this a bug in TinyPG?
Is there any way to redesign the grammar such that
TinyPG
will properly parse the example line?Are there any other suggestions for a simple parser that outputs code in
C#
and doesn't require additional libraries? I've looked atLLLPG
andANTLR4
, but found them much more troublesome thanTinyPG
.