2

There's a rule about the flight no(such as:CZ3102), which has 2 chars followed by 3-4 digits. And its Regular Expression should be: [A-Z]{2}[0-9]{3,4}.

Then how to write the lexer rule under ANTLR4?

One easy lexer rule is: [A-Z][A-Z][0-9][0-9][0-9][0-9]?

But that's not so elegant, and if the range is big, such as 1-255, it's not so easy the lexer rule.

Thanks

大蛋散
  • 113
  • 1
  • 2
  • 7
  • Behind the cover ANTLR4 uses java regex. So maybe something like [this](http://www.regular-expressions.info/repeat.html) will work. – Giovanni Botta Dec 17 '14 at 01:41
  • @GiovanniBotta Your statement could not be farther from the truth. ANTLR 4 uses a custom implementation of an NFA simulation with on-demand DFA caching. There are no references to Java's Regular Expression implementation anywhere in the ANTLR 4 runtime, much less in the lexer implementation itself. – Sam Harwell Dec 17 '14 at 03:48
  • Wow I couldn't believe it and I checked the github repo. Basically no reference to java.util.regex. Good catch! Unfortunately I could not find a reference for the ANTLR4 regex. I used to have the book but I no longer do so I can't confirm it's in there. – Giovanni Botta Dec 17 '14 at 04:39
  • I guess [this](https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Lexer+Rules) is a start. – Giovanni Botta Dec 17 '14 at 04:43
  • So, buddies, how to resolve my problem? or use the non-greedy mechanism:[A-Z]+?[0-9]+? – 大蛋散 Dec 17 '14 at 06:48

1 Answers1

1

But that's not so elegant, and if the range is big, such as 1-255, it's not so easy the lexer rule.

Tokenize just numbers, and validate the numerical value inside parser listener or visitor.

Related links:

Community
  • 1
  • 1
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288