2

I would like to parse following expresion with antlr4

termspannear ( xxx, xxx , 5 , true ) 

termspannear ( xxx, termspannear ( xxx, xxx , 5 , true ) , 5 , true ) 

Where termspannear functions can be nested

Here is my grammar:

//Define a gramar to parse TermSpanNear
grammar TermSpanNear;
start       : TERMSPAN ;

TERMSPAN    : TERMSPANNEAR | 'xxx' ;
TERMSPANNEAR: 'termspannear' OPENP BODY CLOSEP ;
BODY        : TERMSPAN COMMA TERMSPAN COMMA SLOP COMMA ORDERED ;
COMMA       : ',' ;
OPENP       : '(' ;
CLOSEP      : ')' ;
SLOP        : [0-9]+ ;
ORDERED     : 'true' | 'false' ;
WS          : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

After running:

antlr4 TermSpanNear.g4
javac TermSpanNear*.java
grun TermSpanNear start -gui
termspannear ( xxx, xxx , 5 , true )
^D![enter image description here][1]
line 1:0 token recognition error at: 'termspannear '
line 1:13 extraneous input '(' expecting TERMSPAN

and the tree looks like:

enter image description here

Can someone help me with this grammar ? So the parsed tree contains all params and and also nesting works

NOTE: After suggestion by I rewrote it to

//Define a gramar to parse TermSpanNear
grammar TermSpanNear;
start       : termspan EOF;

termspan    : termspannear | 'xxx' ;
termspannear: 'termspannear' '('  body  ')' ;
body        : termspan ',' termspan ',' SLOP ',' ORDERED ;

SLOP        : [0-9]+ ;
ORDERED     : 'true' | 'false' ;
WS          : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

I think now it works I'm geting the following trees: For

termspannear ( xxx, xxx , 5 , true ) 

enter image description here

For
termspannear ( xxx, termspannear ( xxx, xxx , 5 , true ) , 5 , true )

enter image description here

Seki
  • 11,135
  • 7
  • 46
  • 70
szydan
  • 2,318
  • 1
  • 15
  • 16

1 Answers1

1

You're using way too many lexer rules.

When you're defining a token like this:

BODY        : TERMSPAN COMMA TERMSPAN COMMA SLOP COMMA ORDERED ;

then the tokenizer (lexer) will try to create the (single!) token: xxx,xxx,5,true. E.g. it does not allow any space in between it. Lexer rules (the ones starting with a capital) should really be the "atoms" of your language (the smallest parts). Whenever you start creating elements like a body, you glue atoms together in parser rules, not in lexer rules.

Try something like this:

grammar TermSpanNear;

// parser rules (the elements)
start          : termpsan EOF ;
termpsan       : termpsannear | 'xxx' ;
termpsannear   : 'termspannear' OPENP body CLOSEP ;
body           : termpsan COMMA termpsan COMMA SLOP COMMA ORDERED ;

// lexer rules (the atoms)
COMMA          : ',' ;
OPENP          : '(' ;
CLOSEP         : ')' ;
SLOP           : [0-9]+ ;
ORDERED        : 'true' | 'false' ;
WS             : [ \t\r\n]+ -> skip ;
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Thanks Look I've edited the question I've follow your suggestion and I think now it works as I wanted. – szydan Mar 27 '14 at 21:24
  • @szydan, yeah, that looks correct to me. In case you're wondering: the double `termspannear`s is once the parser rule, and the one below it is the literal token containing the text `'termspannear'`. – Bart Kiers Mar 27 '14 at 21:27
  • Yes I figured that out. It parses my input correctly. Now I have to figure out how to use the generated java code ;-) Probably very soon I'll be asking another questions Thank you – szydan Mar 27 '14 at 21:57
  • @szydan, no problem. The next step will probably to create a listener or visitor. See: https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Parse+Tree+Listeners – Bart Kiers Mar 27 '14 at 22:01
  • Bart I've generated both listeners and visitors ( -visitor flag) On the page you've mention there is an example how to use the listener but there is no example how to use visitor Do I also use a walker or how ? – szydan Mar 27 '14 at 22:16
  • Here's a complete example of how to use a visitor: http://stackoverflow.com/questions/15610183/if-else-statements-in-antlr-using-listeners – Bart Kiers Mar 27 '14 at 22:31