1

I am trying to match (and ignore) c-style block comments. To me the sequence is (1) /* followed by (2) anything other than /* or */ until (3) */.

BLOCK_COMMENT_START
    : "/*"
    ;

BLOCK_COMMENT_END
    : "*/"
    ;

BLOCK_COMMENT
    : BLOCK_COMMENT_START ( ~( BLOCK_COMMENT_START | BLOCK_COMMENT_END ) )* BLOCK_COMMENT_END {
        // again, we want to skip the entire match from the lexer stream
        $setType( Token.SKIP );
    }
    ;

But Antlr does not think like I do ;)

sql-stmt.g:121:34: This subrule cannot be inverted.  Only subrules of the form:
    (T1|T2|T3...) or
    ('c1'|'c2'|'c3'...)
may be inverted (ranges are also allowed).

So the error message is a little cryptic, but I think it is trying to say that only ranges, single-char alts or token alts can be negated. But isn't that what I have? Both BLOCK_COMMENT_START and BLOCK_COMMENT_END are tokens. What am I missing?

Thanks a lot for any help.

Steve Ebersole
  • 9,339
  • 2
  • 48
  • 46
  • https://www.antlr2.org/doc/metalang.html#_bb13 - so it seems that maybe (cryptic message) the token form (`~(T1|...)`) can only be used in the parser while the character form (`~('c1'|...)`) can only be used in lexers. It is not clear from the doc, but the doc seems to imply it in the examples. So while this, essentially same, grammar works in Antlr4 maybe I need to move these types of rules out of the lexer and into the parser for v2? – Steve Ebersole May 15 '20 at 16:49
  • Ah, yes, you can only do `~( A | B )` inside a lexer rule if both `A` and `B` match a single char. The error is cryptic indeed, and I think that the numbers in `('c1'|'c2'|'c3'...)` are really meant as subscripts (e.g. `('c'|'t'|'a'...)`). And ANTLR v4 doesn't even allow `~( A | B )`, producing the error: `error(183): ./test/src/main/antlr4/test/T.g4:14:31: rule reference A is not currently supported in a set` – Bart Kiers May 18 '20 at 08:58
  • Btw, have you seen this: https://theantlrguy.atlassian.net/wiki/spaces/ANTLR3/pages/2687360/How+do+I+match+multi-line+comments ? – Bart Kiers May 18 '20 at 08:59
  • Thanks @BartKiers ! Yeah the working Antlr4 version does it very differently – Steve Ebersole May 19 '20 at 12:20

0 Answers0