1

A newbie question. Say, I have a lexer rule simply listing all acceptable symbols :

ACCEPTED_SYMBOLS:   ('~' |'!' |'@' |'#' |'$' |'%' |'^' |'\-' |'\+' | '=' |
                  '\\'|':' |'\"'|'\''|'<' |'>' |',' |'.' |'?' | '/'  ) ;

But sometimes I want another rule that accepts all symbols except, say, the '='

 ACCEPTED_SYMBOLS_EXCEPT_EQUAL: ('~' |'!' |'@' |'#' |'$' |'%' |'^' |'\-' |'\+' | 
                      '\\'|':' |'\"'|'\''|'<' |'>' |',' |'.' |'?' | '/'  ) ;

Basically i just repeat the list without '='.

But this sounds like a stupid way to define tokens. What if I got another ACCEPTED_SYMBOLS_EXCEPT_HASH/COLON/etc.

Is it possible to write a parser rule that derives the matching symbols based on ACCEPTED_SYMBOLS? Semantic predicate sounds like the choice but I am new to ANTLR and don't know how to use it.

JavaMan
  • 4,954
  • 4
  • 41
  • 69

1 Answers1

2

Let's say inside your a rule all ACCEPTED_SYMBOLS chars are valid but inside rule b the = is not valid.

You could do this using a predicate like this:

a
  :  ACCEPTED_SYMBOLS
  ; 

b
  :  t=ACCEPTED_SYMBOLS {!$t.text.equals("=")}?
  ;

ACCEPTED_SYMBOLS
  :  '~'  | '!' | '@' | '#'  | '$' | '%' | '^' | '-' | '+' | '=' |
     '\\' | ':' | '"' | '\'' | '<' | '>' | ',' | '.' | '?' | '/' 
  ;

Note that only single quote and backslashes need to be escaped inside a literal-string in an ANTLR grammar.

Or, without a predicate:

a
  :  any
  ; 

b
  :  SYMBOLS
  ;

any
  :  SYMBOLS 
  |  EQ
  ;

SYMBOLS
  :  '~'  | '!' | '@' | '#'  | '$' | '%' | '^' | '-' | '+' |
     '\\' | ':' | '"' | '\'' | '<' | '>' | ',' | '.' | '?' | '/' 
  ;

EQ
  :  '='
  ;

EDIT

Note that you cannot define the rules in the following order:

ACCEPTED_SYMBOLS:   ('~' |'!' |'@' |'#' |'$' |'%' |'^' |'-' |'+' | '=' |
                  '\\'|':' |'"'|'\''|'<' |'>' |',' |'.' |'?' | '/'  ) ;

ACCEPTED_SYMBOLS_EXCEPT_EQUAL: ('~' |'!' |'@' |'#' |'$' |'%' |'^' |'-' |'+' | 
                      '\\'|':' |'"'|'\''|'<' |'>' |',' |'.' |'?' | '/'  ) ;

ANTLR will throw an error that the token ACCEPTED_SYMBOLS_EXCEPT_EQUAL can never be created since prior rule(s) will already match everything ACCEPTED_SYMBOLS_EXCEPT_EQUAL can match.

And if you'd switch the rules:

ACCEPTED_SYMBOLS_EXCEPT_EQUAL: ('~' |'!' |'@' |'#' |'$' |'%' |'^' |'-' |'+' | 
                      '\\'|':' |'"'|'\''|'<' |'>' |',' |'.' |'?' | '/'  ) ;

ACCEPTED_SYMBOLS:   ('~' |'!' |'@' |'#' |'$' |'%' |'^' |'-' |'+' | '=' |
                  '\\'|':' |'"'|'\''|'<' |'>' |',' |'.' |'?' | '/'  ) ;

then the rule ACCEPTED_SYMBOLS can only ever match a '='. All other characters will be tokenized as ACCEPTED_SYMBOLS_EXCEPT_EQUAL tokens.

You must realize that the lexer operates independently from the parser: it simply creates tokens going through the lexer rules from top to bottom, trying to match as much as possible, and it does not care what the parser at that time is trying to match.

Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Thanks. Is this called the "disambiguating semantic predicate"? ie. having the {expr}? placed AFTER the element? I'm reading the ANTLR Ref book chapter 12, but is confused by the difference between Validating and Disambiguating Semantic Predicate. Both are having the {..}? syntax without the '=>'. – JavaMan Sep 18 '11 at 11:48
  • @JavaMan, the `{boolean-expression}?` is a _validating semantic predicate_. It's first explained in **Ch 12.1**, on page 285 of the printed book (the PDF has a different page numbering). – Bart Kiers Sep 18 '11 at 11:58
  • But then what is *Disambiguating Semantic Predicate*? And how is it different from *Validating Semantic Predicate*? Syntactically, they looks the same. – JavaMan Sep 18 '11 at 12:01
  • @JavaMan, AFAIK, a _disambiguating semantic predicate_ is pretty much the same as a _validating semantic predicate_. However, a _disambiguating_ variant is placed at the start of a parser-rule and when the expression in it fails, no error is thrown, but the parser chooses a different alternative. Whereas a _validating semantic predicate_ would throw an error after certain parts of the rule were already matched. – Bart Kiers Sep 18 '11 at 12:54
  • If *Validating Semantic Predicate* throws FailedPredicateException when it evaluates to false, would this Exception cause the whole parsing to fail? – JavaMan Sep 18 '11 at 15:44
  • @JavaMan, FYI, I added an example of a _disambiguating semantic predicate_ here: http://stackoverflow.com/questions/3056441/what-is-a-semantic-predicate-in-antlr – Bart Kiers Sep 21 '11 at 06:02
  • If the example given is a *Validating semantic predicate* and it will cause the whole parsing to fail by throwing an exception, then how can we use it to exclude some symbols. Say, if a character matches the rule 'ACCEPTED_SYMBOLS' above but is not a '=', shouldn't it cause a FailedPredicateException and the whole parsing halt? – JavaMan Sep 25 '11 at 07:12
  • _"shouldn't it cause a FailedPredicateException and the whole parsing halt?"_, my first example does just that. – Bart Kiers Sep 25 '11 at 08:18