Context
I am working on a parser for a language that potentially includes references database fields. The field's type affects how the input could be parsed, but there is nothing to distinguish between types during lexical parsing (they just use the syntax [myDBField]
), so these referencing are all parsed into a single token type.
The problem
My hope was to then use semantic predicates in the semantic parser rules to lookup the field's type and ensure only valid rules are considered. Here is a representative grammar:
expression : intDbField | dateDbField ;
intDbField : {$parser.isInt($text)}? DatabaseField ;
dateDbField : {$parser.isDate($text)}? DatabaseField ;
DatabaseField : '[' [A-Za-z]+ ']' ;
(I am using the Python target for the generated parser, subclassing the generated parser and implementing the isInt()
and isDate()
methods there)
However it seems that if the semantic predicate returns false when attempting to match the intDbField
, it doesn't back up to the expression
rule to try the dateDbField
but returns an error at that point (having apparently exhausted all matching options).
A similar example that does work
If I change my grammar to this:
expression : {$parser.isInt($text)}? DatabaseField # intDbField
| {$parser.isDate($text)}? DatabaseField # dateDbField
;
DatabaseField : '[' [A-Za-z]+ ']' ;
then the semantic predicates work as expected, and a date database field will be correctly parsed as the second alternative.
Unfortunately my actual grammar is a lot more complicated than the minimal example above, so it's not possible to have the alternatives which are determined by semantic predicates all listed in one rule like this.
Question
Are 'disambiguating' semantic predicates like this limited to disambiguating alternatives within a rule, or is there a way to make the parser backtrack 'further up' to then try other alternatives?
Also, is there any documentation for the language-agnostic aspects of the semantic predicates here?
(I followed the guidance in the Python target documentation about target-agnostic grammars, but have been looking for more complete documentation e.g. about the special variables available here like $parser
and $text
.)