antlr how to define optional parts in any order

Question

Suppose need the grammar to parse the following templates:

1. REPORT
2. BEGIN
3.   QUERY
4.   BEGIN
5.     AGGREGATION: day
6.     DIMENSION: department
7.   END
8. END

Where line #5 and #6 are optional and the order of the 2 lines doesn't matter. How can I specify this in my grammar file? Below is my solution (see line #12):

1. grammar PRL;
2. report
3.  : REPORT
4.      BEGIN 
5.          query
6.      END
7.  ;
8.
9. query
10.  : QUERY 
11.     BEGIN
12.         (aggregation_decl dimension_decl | dimension_decl aggregation_decl)? 
13.     END
14. ;

So it works, however it looks ugly, and if I have more than 2 parts it's going to become unmanageable very quickly? Any advice?

score 0 · Answer 1 · answered Jun 26 '12 at 03:49

0

Something like this? Generally you would enforce only one of each item exists at a later processing step. Otherwise, as you see, the grammar gets unwieldy.

grammar PRL;
report
  : REPORT
      BEGIN 
          query
      END
  ;

query
  : QUERY 
     BEGIN
       body_decl* 
     END
 ;

body_decl :
   aggregation_decl dimension_decl
 | dimension_decl aggregation_decl;

answered Jun 26 '12 at 03:49

hmm.. didn't see much difference from the solution given out in the question. Is there a way to avoid enumerate all combination of the parts? say now 2 parts have 2 combinations, 3 will become 6 and 4 will be 24, ... – Gelin Luo Jun 26 '12 at 04:00

score 0 · Answer 2 · edited May 23 '17 at 11:46

As already mentioned by Adam: this generally something done after the parser has created some sort of (abstract) parse tree. You simply collect all types of declarations like this:

grammar PRL;

report
 : REPORT BEGIN query END
 ;

query
 : QUERY BEGIN decl* END
 ;

decl
 : NAME ':' NAME
 ;

REPORT : 'REPORT';
BEGIN  : 'BEGIN';
END    : 'END';
QUERY  : 'QUERY';
NAME   : ('a'..'z' | 'A'..'Z')+;

SPACE  : (' ' | '\t' | '\r' | '\n')+ {skip();};

and after that, check if there are duplicates in decl* in your AST.

But if you really want to do this during parsing, you need to grab the left hand side of decl and add these in a Set and when you stumble upon a duplicate, throw a predicate exception:

grammar PRL;

@parser::header {
  import java.util.Set;
  import java.util.HashSet;
}

report
 : REPORT BEGIN query END
 ;

query
 : QUERY BEGIN unique_decls END
 ;

unique_decls
@init{Set<String> set = new HashSet<String>();}
 : (decl {set.add($decl.key)}?)*
 ;

decl returns[String key]
 : k=NAME ':' NAME {$key = $k.text;}
 ;

REPORT : 'REPORT';
BEGIN  : 'BEGIN';
END    : 'END';
QUERY  : 'QUERY';
NAME   : ('a'..'z' | 'A'..'Z')+;

SPACE  : (' ' | '\t' | '\r' | '\n')+ {skip();};

The {set.add($decl.key)}?, called a Validating Semantic Predicates, will throw an exception when the code inside it (set.add($decl.key)) evaluates to false. In this case, it evaluates to false whenever the set already contains a certain key.

antlr how to define optional parts in any order

2 Answers2