1

I have the following ABNF rule for string definition.

STRING   = ALPHA *(allowedchar)

allowedchar   = "-" / "_" / DIGIT / ALPHA

ALPH = A-Z ,a-z

Valid tokens:

aa1
a_1___a
a23
a
a-1
a_a 

(if first char is alpha, then reset can be any char form 'allowedchar')

Invalid tokens:

-e
--
-1
-a
--1
--a
1 

(doesn't starts with Alphabets).

So far, I have the grammar works for all the inputs(both valid and invalid) except "--a" and "__a".

ANTL4 accepts this token as valid strings.

I am not sure why this is not working.

My Grammar

STRING : ALPHANUMERIC
ALPHA           : [a-zA-Z]+ ;

fragment ALPHANUMERIC : ALPHA (ALLOWEDATTCHAR)* ;

fragment ALLOWEDATTCHAR : '-' | '_' | [0-9] | ALPHA ;
Cœur
  • 37,241
  • 25
  • 195
  • 267

1 Answers1

0

ANTL4 accepts this token as valid strings.

I doubt that.

Assuming your ALPHA rule looks like this:

fragment ALPHA : [a-zA-Z];

I'm sure ANTLR does not tokenize "--a" or "__a" as a STRING.

Check the output stream ANTLR is writing errors/warnings to: chances are ANTLR is informing you about something that goes wrong, and then recovers from it and continues parsing/lexing.

EDIT

If you want to override ANTLR default error-handling/reporting, see: Handling errors in ANTLR4

Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • ALPHA: [a-zA-Z]+ ; It parsing as "a" and called visiter implementation methods, where context.getText() is returning just "a". I expect the behaviour should make the token invalid and call BaseErrorListener.syntaxError(..). – user1610746 Dec 16 '14 at 22:58
  • Thanks Bart Kiers, after adding error handler for Parser and lexer, I was able to report error. – user1610746 Dec 29 '14 at 22:30