21

How can the negation meta-character, ~, be used in ANTLR's lexer- and parser rules?

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288

1 Answers1

37

Negating can occur inside lexer and parser rules.

Inside lexer rules you can negate characters, and inside parser rules you can negate tokens (lexer rules). But both lexer- and parser rules can only negate either single characters, or single tokens, respectively.

A couple of examples:

lexer rules

To match one or more characters except lowercase ascii letters, you can do:

NO_LOWERCASE : ~('a'..'z')+ ;

(the negation-meta-char, ~, has a higher precedence than the +, so the rule above equals (~('a'..'z'))+)

Note that 'a'..'z' matches a single character (and can therefor be negated), but the following rule is invalid:

ANY_EXCEPT_AB : ~('ab') ;

Because 'ab' (obviously) matches 2 characters, it cannot be negated. To match a token that consists of 2 character, but not 'ab', you'd have to do the following:

ANY_EXCEPT_AB 
  :  'a' ~'b' // any two chars starting with 'a' followed by any other than 'b'
  |  ~'a' .   // other than 'a' followed by any char
  ;

parser rules

Inside parser rules, ~ negates a certain token, or more than one token. For example, you have the following tokens defined:

A : 'A';
B : 'B';
C : 'C';
D : 'D';
E : 'E';

If you now want to match any token except the A, you do:

p : ~A ;

And if you want to match any token except B and D, you can do:

p : ~(B | D) ;

However, if you want to match any two tokens other than A followed by B, you cannot do:

p : ~(A B) ;

Just as with lexer rules, you cannot negate more than a single token. To accomplish the above, you need to do:

P
  :  A ~B
  |  ~A .
  ; 

Note that the . (DOT) char in a parser rules does not match any character as it does inside lexer rules. Inside parser rules, it matches any token (A, B, C, D or E, in this case).

Note that you cannot negate parser rules. The following is illegal:

p : ~a ;
a : A  ;
Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Thanks for the clarification. I was not aware that the `~` operator would apply to tokens when occurring in a parser rule. – Gunther Nov 27 '11 at 13:17
  • @Gunther, no problem. I often mention it briefly in my answers, so from now on I can link to this Q&A. W.r.t. your converter, perhaps you're already using it, but perhaps not: the `org.antlr.tool.Strip` class removes all custom code from ANTLR grammar files which may make your life easier when parsing ANTLR grammars. – Bart Kiers Nov 27 '11 at 15:07