1

How can I write this grammar expression for ANTLR4 input?

Originally expression:

<int_literal> = 0|(1 -9){0 -9}
<char_literal> = ’( ESC |~( ’|\| LF | CR )) ’

<string_literal> = "{ ESC |~("|\| LF | CR )}"

I tried the following expression:

int_literal : '0' | ('1'..'9')('0'..'9')*;
char_literal : '('ESC' | '~'('\'|'''|'LF'|'CR'))';

But it returned:

syntax error: '\' came as a complete surprise to me

syntax error: mismatched input ')' expecting SEMI while matching a rule
unterminated string literal

1 Answers1

0

Your quotes don't match:

'('ESC' | '~'('\'|'''|'LF'|'CR'))'
^ ^   ^   ^ ^ ^ ^ 
| |   |   | | | |
o c   o   c o c error

o is open, c is close

I'd read "{ ESC |~("|\| LF | CR )}" as this:

// A string literal is zero or more chars other than ", \, \r and \n
// enclosed in double quotes
StringLiteral
 : '"' ( Escape | ~( '"' | '\\' | '\r' | '\n' ) )* '"'
 ;

Escape
 : '\\' ???
 ;

Also note that ANTLR4 has short hand char classes ([0-9] equals '0'..'9'), so you can do this:

IntLiteral
 : '0' 
 | [1-9] [0-9]*
 ;

StringLiteral
 : '"' ( Escape | ~["\\\r\n] )* '"'
 ;

Also not that lexer rules start with an uppercase letter! Otherwise they become parser rules (see: Practical difference between parser rules and lexer rules in ANTLR?).

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288