1

If I have the following the grammar to parse a list of Integers separated by whitespace:

grammar TEST;

test
    : expression* EOF
    ;

expression
    : integerLiteral
    ;

integerLiteral
    : INTLITERAL
    ;

PLUS: '+';
MINUS: '-';

DIGIT: '0'..'9';
DIGITS: DIGIT+;
INTLITERAL: (PLUS|MINUS)? DIGITS;

WS: [ \t\r\n] -> skip;

It does not work! If I pass "100" I get:

line 1:0 extraneous input '100' expecting {<EOF>, INTLITERAL}

However if remove the lexer INTLITERAL rule and place it just under the parser rule integerLiteral like this

integerLiteral
    : (PLUS|MINUS)? DIGITS
    ;

Now it seems to work just fine!

I feel that if I am able to understand why this is I'll begin to understand some idiosyncrasies that I am experiencing.

jross
  • 1,129
  • 9
  • 10

1 Answers1

3

The lexer creates tokens in the following manner:

  1. try to match as many characters for a single token as possible
  2. if two tokens match the same characters, let the one defined first "win"

Given the information from the 2 rules above, then you will see that your rules:

DIGITS: DIGIT+;
INTLITERAL: (PLUS|MINUS)? DIGITS;

are the problem. For the input 100 a DIGITS token is created: rule 2 applies here: both rules match 100, but since DIGITS is defined before INTLITERAL, a DIGITS token is created.

Solution 1

Move INTLITERAL above DIGITS:

INTLITERAL: (PLUS|MINUS)? DIGITS;
DIGIT: '0'..'9';
DIGITS: DIGIT+;

But now notice that DIGIT and DIGITS will never become tokens on their own because INTLITERAL will always be matched first. In this case, you can make both of these rules fragments, and then it doesn't matter where you place them because fragment rules are only used inside other lexer rules (not in parser rules)

Solution 2

Make DIGIT and DIGITS fragments

fragment DIGIT: '0'..'9';
fragment DIGITS: DIGIT+;
INTLITERAL: (PLUS|MINUS)? DIGITS;

Solution 3

Or better, don't glue the operator on the INTLITERAL but match it in an unary expression:

expression
    : (MINUS | PLUS) expression
    | expression (MINUS | PLUS) expression
    | integerLiteral
    ;

integerLiteral
    : INTLITERAL
    ;

PLUS: '+';
MINUS: '-';

fragment DIGIT: '0'..'9';

INTLITERAL: DIGIT+;
sepp2k
  • 363,768
  • 54
  • 674
  • 675
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • ah ... thanks for the answer. I had not run across as clear as an explanation as you provided! – jross May 27 '20 at 19:44
  • 1
    Yeah, there are a lot of `extraneous input ... expecting` questions here on SO, but I never feel comfortable closing questions as duplicate since the solution always involves some grammar specific hints. You're welcome @jross. – Bart Kiers May 27 '20 at 20:12