Defining Antlr lexer rule with termination condition

Question

There is a case to parse 2 tokens which are separated by ‘2/’ . Both tokens can be of alphanumeric characters with no fixed length.

Examples: Abcd34D22/ERTD34D or ABCD2/DEF

Desired output : TOKEN1 = ‘Abcd34D2’, SEPARATOR: ‘2/’ , TOKEN2 = ‘ERTD34D’

I would like to know if there is a way to define lexer rule for TOKEN1 and manage the ambiguity so that if 2 is followed by /, it should qualified to be matched as SEPARATOR. Below is the sample token definitions for illustration.

fragment ALPHANUM: [0-9A-Za-z];
fragment SLASH: '/';
TOKEN1 : (ALPHANUM)+;
SEPARATOR : '2' SLASH -> mode(TOKEN2_MODE);
mode TOKEN2_MODE;
TOKEN2 : (ALPHANUM)+;

score 1 · Accepted Answer · answered Feb 10 '21 at 21:47

AFAIK, you'll have to use a predicate, which means you'll have to add some target specific code to your grammar. If your target language is Java, you could do something like this:

TOKEN1
 : TOKEN1_ATOM+
 ;

fragment TOKEN1_ATOM
 : [013-9A-Za-z]              // match a single alpha-num except '2'
 | '2' {_input.LA(1) != '/'}? // only match `2` if there's no '/' after it
 ;

Defining Antlr lexer rule with termination condition

1 Answers1