0

Context

I am working on a parser for a language that potentially includes references database fields. The field's type affects how the input could be parsed, but there is nothing to distinguish between types during lexical parsing (they just use the syntax [myDBField]), so these referencing are all parsed into a single token type.

The problem

My hope was to then use semantic predicates in the semantic parser rules to lookup the field's type and ensure only valid rules are considered. Here is a representative grammar:

expression : intDbField | dateDbField ;

intDbField : {$parser.isInt($text)}? DatabaseField ;
dateDbField : {$parser.isDate($text)}? DatabaseField ;

DatabaseField : '[' [A-Za-z]+ ']' ;

(I am using the Python target for the generated parser, subclassing the generated parser and implementing the isInt() and isDate() methods there)

However it seems that if the semantic predicate returns false when attempting to match the intDbField, it doesn't back up to the expression rule to try the dateDbField but returns an error at that point (having apparently exhausted all matching options).

A similar example that does work

If I change my grammar to this:

expression : {$parser.isInt($text)}? DatabaseField   # intDbField
           | {$parser.isDate($text)}? DatabaseField  # dateDbField
           ;

DatabaseField : '[' [A-Za-z]+ ']' ;

then the semantic predicates work as expected, and a date database field will be correctly parsed as the second alternative.

Unfortunately my actual grammar is a lot more complicated than the minimal example above, so it's not possible to have the alternatives which are determined by semantic predicates all listed in one rule like this.

Question

Are 'disambiguating' semantic predicates like this limited to disambiguating alternatives within a rule, or is there a way to make the parser backtrack 'further up' to then try other alternatives?

Also, is there any documentation for the language-agnostic aspects of the semantic predicates here?

(I followed the guidance in the Python target documentation about target-agnostic grammars, but have been looking for more complete documentation e.g. about the special variables available here like $parser and $text.)

Related SO questions

Tim
  • 1,839
  • 10
  • 18
  • 1
    I don't know the answer to the backtracking issue, but I can address "target agnostic". There isn't really a format for actions in all targets because targets do not have the same syntax for a method call. But, you can get close and use a script to edit the syntax get the rest of the way. A good example is the universal Python grammar https://github.com/antlr/grammars-v4/tree/master/python/python. Basically, push all your code for the action into a method or function, then use `{ this.FooBar() }` in your grammar. Prior to compile, patch per target if needed with script transformGrammar.py. – kaby76 Nov 03 '22 at 14:34
  • Thanks @kaby76 - that's helpful and the universal Python grammar is an interesting answer. My question was actually more basic(!) , as I've since discovered that the 'special variables' are attributes (closely connected with "actions" which you referred to). So I was basically wanting the attribute tables here: https://github.com/antlr/antlr4/blob/master/doc/actions.md – Tim Nov 03 '22 at 17:08

0 Answers0