0

I have the following in the lexer

INTEGER : DIGIT+;
NOT: '!';
MINUS:'-';
PLUS:'+';
fragment DIGIT: '0'..'9';

I have the following in the parser

expr:
   intLiteral
   | UnaryOp expr;

intLiteral: (PLUS|MINUS)? INTEGER;

UnaryOp: NOT|MINUS;

When I use grun to test it with -2, I get it being matched to UnaryOp expr instead of just intLiteral. In other words, the minus sign is being detected as a UnaryOp. Why would this be occuring and is there a way to fix it?

Rishi
  • 59
  • 1
  • 4
  • Please also provide the lexer rules so that we can copy-paste and try you grammar without having to guess the missing part. – BernardK Oct 26 '17 at 18:07

1 Answers1

0

The practice is to use all CAPITALS for lexer rules, and all lower case for parser rules. If you mix them, it causes errors like the one you have :

$ grun Question expr -tokens -diagnostics input.txt 
[@0,0:0='-',<UnaryOp>,1:0]
[@1,1:1='2',<INTEGER>,1:1]
[@2,2:1='<EOF>',<EOF>,1:2]

Because it starts with a capital letter, UnaryOp is a lexer rule, not a parser rule as you may believe, and the - sign has been matched as a UnaryOp token, because this rule is defined before MINUS:'-';.

If the MINUS rule comes before UnaryOp, the - sign will be matched as a MINUS :

$ grun Question question -tokens -diagnostics input1.txt 
[@0,0:0='-',<'-'>,1:0]
[@1,1:1='2',<INTEGER>,1:1]
[@2,2:1='<EOF>',<EOF>,1:2]

Also, intLiteral may conflict with expr and should be included as one of the possible expressions.

The following grammar follows my style (file Question.g4) :

grammar Question;

question
@init {System.out.println("Question last update 2108");}
    :   line+ EOF
    ;

line
    :   expr NL
        {System.out.println("Expression found : " + $expr.text); }
    ;

expr
    :   ( PLUS | MINUS ) expr   # exprUnaryOp
    |   expr PLUS  expr         # exprAddition
    |   expr MINUS expr         # exprSutraction
    |   atom                    # exprAtom
    ;

atom
    :   INTEGER
    |   ID
    ;

ID      : LETTER ( LETTER | DIGIT | '_' )* ;      
INTEGER : DIGIT+ ;
NOT     : '!' ;
MINUS   : '-' ;
PLUS    : '+' ;
NL      : [\r\n] ;
WS      : [ \t] -> channel(HIDDEN) ; // -> skip ;

fragment LETTER : [a-zA-Z] ;
fragment DIGIT  : [0-9] ;

File input.txt :

-2
- 2
1 - 2
3 + 4
5
abc + def

Execution :

$ grun Question question -tokens -diagnostics input.txt 
[@0,0:0='-',<'-'>,1:0]
[@1,1:1='2',<INTEGER>,1:1]
[@2,2:2='\n',<NL>,1:2]
[@3,3:3='-',<'-'>,2:0]
[@4,4:4=' ',<WS>,channel=1,2:1]
[@5,5:5='2',<INTEGER>,2:2]
...
[@25,27:29='def',<ID>,6:6]
[@26,30:30='\n',<NL>,6:9]
[@27,31:30='<EOF>',<EOF>,7:0]
Question last update 2108
Expression found : -2
Expression found : - 2
Expression found : 1 - 2
Expression found : 3 + 4
Expression found : 5
Expression found : abc + def
BernardK
  • 3,674
  • 2
  • 15
  • 10
  • Hello, thanks. How would I approach it if I had another expression (expr), lets say a boolean? In this case, there cannot be a plus in front of the boolean. – Rishi Oct 26 '17 at 20:40
  • Just add as many lines as expressions you want to match. If `||` is your boolean `or`, then `| expr '||' expr # exprOr`. See [here](https://stackoverflow.com/questions/14064697/antlr-implicit-multiplication/14147356#14147356). Look in SO for ANTL gurus like [Bart](https://stackoverflow.com/users/50476/bart-kiers). – BernardK Oct 26 '17 at 21:17