1

There is a case to parse 2 tokens which are separated by ‘2/’ . Both tokens can be of alphanumeric characters with no fixed length.

Examples: Abcd34D22/ERTD34D or ABCD2/DEF

Desired output : TOKEN1 = ‘Abcd34D2’, SEPARATOR: ‘2/’ , TOKEN2 = ‘ERTD34D’

I would like to know if there is a way to define lexer rule for TOKEN1 and manage the ambiguity so that if 2 is followed by /, it should qualified to be matched as SEPARATOR. Below is the sample token definitions for illustration.

fragment ALPHANUM: [0-9A-Za-z];
fragment SLASH: '/';
TOKEN1 : (ALPHANUM)+;
SEPARATOR : '2' SLASH -> mode(TOKEN2_MODE);
mode TOKEN2_MODE;
TOKEN2 : (ALPHANUM)+;
Sumit Kathuria
  • 103
  • 2
  • 7

1 Answers1

1

AFAIK, you'll have to use a predicate, which means you'll have to add some target specific code to your grammar. If your target language is Java, you could do something like this:

TOKEN1
 : TOKEN1_ATOM+
 ;

fragment TOKEN1_ATOM
 : [013-9A-Za-z]              // match a single alpha-num except '2'
 | '2' {_input.LA(1) != '/'}? // only match `2` if there's no '/' after it
 ;
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288