1

I am trying to create a very simple antlr grammar file which should parse the following file:

Report (MyReport)
Begin
End

Or without report name:

Report
Begin
End

And here is my grammar file:

grammar RL;

options {
  language = Java;
}

report:
  REPORT ('(' SPACE* STRING_LITERAL SPACE* ')')?
  BEGIN
  END
  ;

REPORT
    :   'Report'
    ;     

BEGIN
    :   'Begin'
    ;

END :   'End';

NAME:   LETTER (LETTER | DIGIT | '_')*;

STRING_LITERAL :    NAME SPACE*;

fragment LETTER: LOWER | UPPER;

fragment LOWER: 'a'..'z';

fragment UPPER: 'A'..'Z';

fragment DIGIT: '0'..'9';

fragment SPACE: ' ' | '\t';

WHITESPACE: SPACE+ { $channel = HIDDEN; };

rule: ;

However when I debug in ANTLRWorks I always get the following error:

 root -> report -> MismatchedTokenException(0!=0)

What's wrong in my Grammar file?

thanks, Green

Gelin Luo
  • 14,035
  • 27
  • 86
  • 139

1 Answers1

3

A couple of remarks:

  • Java is the default language, so you can omit language=Java;;
  • you're using SPACE inside a parser rule, while this SPACE token is a fragment: this causes the lexer never to create this token: remove it from your parser rule(s);
  • the input "Report " ("Report" followed by a single white-space) is being tokenized as a STRING_LITERAL, not as a REPORT! ANTLR's lexer consumes characters greedily, only when two or more rules match the same amount of characters, the rule defined first will get precedence. The lexer does not produce tokens that the parser is trying to match (parsing and tokenization are being performed independently!).

Try the following instead:

grammar RL;

report
 : REPORT ('(' NAME ')')? BEGIN END
 ;

REPORT : 'Report';     
BEGIN  : 'Begin';
END    : 'End';
NAME   : LETTER (LETTER | DIGIT | '_')*;

fragment LETTER : LOWER | UPPER;
fragment LOWER  : 'a'..'z';
fragment UPPER  : 'A'..'Z';
fragment DIGIT  : '0'..'9';

SPACE  : (' ' | '\t' | '\r' | '\n')+ { $channel = HIDDEN; };

green wrote:

What if I want to allow "SPACE" inside Report NAME?

I would still skip spaces in the lexer. Accepting spaces between names but ignoring them in other contexts will result in some clunky rules. Instead of accounting for spaces between a report's name, I would do something like this:

report
 : REPORT ('(' report_name ')')? BEGIN END
 ;

report_name
 : NAME+
 ;

resulting in the following parse tree:

enter image description here

for the input:

Report (a name with spaces)
Begin
End

green wrote:

so is it possible to allow me use reserved words like 'Report' in the name?

Sure, explicitly add them in the report_name rule:

report_name
 : (NAME | REPORT | BEGIN | END)+
 ;
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Thank you very much. What if I want to allow "SPACE" inside Report NAME? – Gelin Luo Jun 25 '12 at 17:49
  • I followed the "report_name" approach and get this error: "T:\tmp\RL\RL.g:11:1: The following token definitions can never be matched because prior tokens match the same input: REPORT_NAME". The source code is at https://gist.github.com/2991022 – Gelin Luo Jun 25 '12 at 20:29
  • @green, no, you did not follow the approach. You used a lexer rule `REPORT_NAME` while I used a parser rule `report_name`. Be sure you understand the difference between the two: http://stackoverflow.com/questions/4297770/practical-difference-between-parser-rules-and-lexer-rules-in-antlr – Bart Kiers Jun 25 '12 at 20:49
  • hmm... this is really tricky. thx a lot! Now I've make it works. The new problem is I can't use word "Report" in the report_name, it will report `UnwantedTokenException(found=Report)`. I think it's because i've defined `REPORT : 'Report';`, so is it possible to allow me use reserved words like 'Report' in the name? – Gelin Luo Jun 25 '12 at 22:03
  • @green, see my revised answer. – Bart Kiers Jun 26 '12 at 07:21