0

First of all, I have read the solutions for the following similar questions: q1 q2 q3

Still I don't understand why I get the following message:

line 1:0 missing 'PROGRAM' at 'PROGRAM'

when I try to match the following:

PROGRAM test
BEGIN
END

My grammar:

grammar Wengo;

program           : PROGRAM id BEGIN pgm_body END ;
id                : IDENTIFIER ;
pgm_body          : decl func_declarations ;
decl              : string_decl decl | var_decl decl | empty ;

string_decl       : STRING id ASSIGN str SEMICOLON ;
str               : STRINGLITERAL ;

var_decl          : var_type id_list SEMICOLON ;
var_type          : FLOAT | INT ;
any_type          : var_type | VOID ; 
id_list           : id id_tail ;
id_tail           : COMA id id_tail | empty ;

param_decl_list   : param_decl param_decl_tail | empty ;
param_decl        : var_type id ;
param_decl_tail   : COMA param_decl param_decl_tail | empty ;

func_declarations : func_decl func_declarations | empty ;
func_decl         : FUNCTION any_type id (param_decl_list) BEGIN func_body END ;
func_body         : decl stmt_list ;

stmt_list         : stmt stmt_list | empty ;
stmt              : base_stmt | if_stmt | loop_stmt ; 
base_stmt         : assign_stmt | read_stmt | write_stmt | control_stmt ;

assign_stmt       : assign_expr SEMICOLON ;
assign_expr       : id ASSIGN expr ;
read_stmt         : READ ( id_list )SEMICOLON ;
write_stmt        : WRITE ( id_list )SEMICOLON ;
return_stmt       : RETURN expr SEMICOLON ;

expr              : expr_prefix factor ;
expr_prefix       : expr_prefix factor addop | empty ;
factor            : factor_prefix postfix_expr ;
factor_prefix     : factor_prefix postfix_expr mulop | empty ;
postfix_expr      : primary | call_expr ;
call_expr         : id ( expr_list ) ;
expr_list         : expr expr_list_tail | empty ;
expr_list_tail    : COMA expr expr_list_tail | empty ;
primary           : ( expr ) | id | INTLITERAL | FLOATLITERAL ;
addop             : ADD | MIN ;
mulop             : MUL | DIV ;

if_stmt           : IF ( cond ) decl stmt_list else_part ENDIF ;
else_part         : ELSE decl stmt_list | empty ;
cond              : expr compop expr | TRUE | FALSE ;
compop            : LESS | GREAT | EQUAL | NOTEQUAL | LESSEQ | GREATEQ ;
while_stmt        : WHILE ( cond ) decl stmt_list ENDWHILE ;

control_stmt      : return_stmt | CONTINUE SEMICOLON | BREAK SEMICOLON ;
loop_stmt         : while_stmt | for_stmt ;
init_stmt         : assign_expr | empty ;
incr_stmt         : assign_expr | empty ;
for_stmt          : FOR ( init_stmt SEMICOLON cond SEMICOLON incr_stmt ) decl stmt_list ENDFOR ;

COMMENT         : '--' ~[\r\n]* -> skip ;
WS              : [ \t\r\n]+ -> skip ;
NEWLINE         : [ \n] ;
EMPTY           : $ ;

KEYWORD         : PROGRAM|BEGIN|END|FUNCTION|READ|WRITE|IF|ELSE|ENDIF|WHILE|ENDWHILE|RETURN|INT|VOID|STRING|FLOAT|TRUE|FALSE|FOR|ENDFOR|CONTINUE|BREAK ;
OPERATOR        : ASSIGN|ADD|MIN|MUL|DIV|EQUAL|NOTEQUAL|LESS|GREAT|LBRACKET|RBRACKET|SEMICOLON|COMA|LESSEQ|GREATEQ ;

IDENTIFIER      : [a-zA-Z][a-zA-Z0-9]* ;
INTLITERAL      : [0-9]+ ;
FLOATLITERAL    : [0-9]*'.'[0-9]+ ;
STRINGLITERAL   : '"' (~[\r\n"] | '""')* '"' ;

PROGRAM     : 'PROGRAM'; 
BEGIN       : 'BEGIN';
END         : 'END';
FUNCTION    : 'FUNCTION';
READ        : 'READ';
WRITE       : 'WRITE';
IF          : 'IF';
ELSE        : 'ELSE';
ENDIF       : 'ENDIF';
WHILE       : 'WHILE';
ENDWHILE    : 'ENDWHILE';
RETURN      : 'RETURN';
INT         : 'INT';
VOID        : 'VOID';
STRING      : 'STRING';
FLOAT       : 'FLOAT' ;
TRUE        : 'TRUE';
FALSE       : 'FALSE';
FOR         : 'FOR';
ENDFOR      : 'ENDFOR';
CONTINUE    : 'CONTINUE';
BREAK       : 'BREAK';

ASSIGN      : ':='; 
ADD     : '+';
MIN     : '-'; 
MUL     : '*';
DIV     : '/'; 
EQUAL       : '='; 
NOTEQUAL    : '!='; 
LESS        : '<';
GREAT       : '>'; 
LBRACKET    : '('; 
RBRACKET    : ')';
SEMICOLON   : ';';
COMA        : ',';
LESSEQ      : '<=';
GREATEQ     : '>=';

From what I've read, I think there's a mismatch between KEYWORD and PROGRAM, but removing KEYWORD altogether does not solve the problem.

EDIT: Removing KEYWORD gives the following message:

line 3:0 mismatched input 'END' expecting {'INT', 'STRING', 'FLOAT', '+'}

This my grun output when KEYWORD is available:

[@0,0:6='PROGRAM',<KEYWORD>,1:0]
[@1,8:11='test',<IDENTIFIER>,1:8]
[@2,13:17='BEGIN',<KEYWORD>,2:0]
[@3,19:21='END',<KEYWORD>,3:0]
[@4,23:22='<EOF>',<EOF>,4:0]
line 1:0 mismatched input 'PROGRAM' expecting 'PROGRAM'
(program PROGRAM test BEGIN END)

This is the output when KEYWORD is removed:

[@0,0:6='PROGRAM',<'PROGRAM'>,1:0]
[@1,8:11='test',<IDENTIFIER>,1:8]
[@2,13:17='BEGIN',<'BEGIN'>,2:0]
[@3,19:21='END',<'END'>,3:0]
[@4,23:22='<EOF>',<EOF>,4:0]
line 3:0 mismatched input 'END' expecting {'INT', 'STRING', 'FLOAT', '+'}
(program PROGRAM (id test) BEGIN (pgm_body decl func_declarations) END)
Nht_e0
  • 140
  • 4
  • 15
  • What precisely does "does not solve the problem" mean? Do you still get the exact same error message? Which tokens are generated for your input (you can find this out with `grun` using the `-tokens` flag or in your Java code by iterating over the token stream and just printing all the tokens)? – sepp2k Sep 06 '18 at 15:14
  • @sepp2k I edited the question – Nht_e0 Sep 06 '18 at 15:34
  • I'm getting lots of errors if I try to run your grammar (an unexpected `$` in the definition of `EMPTY`, lower case `empty` is not defined, the expression rules are mutually left-recursive), so I can't reproduce your issue. It looks like you may have re-typed your grammar by hand and introduced a couple of mistakes in the process. Please copy-and-paste your grammar, so I can reproduce your issue. – sepp2k Sep 06 '18 at 15:37
  • Note: After going through your grammar and fixing the compilation errors, it works fine for me. I think the problem lies in your definition of the `empty` rule (which unfortunately is the one rule you didn't include in the grammar you've posted here). – sepp2k Sep 06 '18 at 15:43
  • I see, Can you tell me which version was it? One with KEYWORD or the one without? You are right, problem is in the definition of `empty` How can I define an empty string in Antlr4? I tried the following but none of them worked `EMPTY :$` or `EMPTY :^$` – Nht_e0 Sep 06 '18 at 16:08
  • Without. The one with `KEYWORD` is just wrong. – sepp2k Sep 06 '18 at 16:11
  • Let me ask you this: If you successfully define an `EMPTY` rule, what do you expect the token stream to look like? How many `EMPTY` tokens should there be and where in the stream would they be located? Keep in mind that the lexer does not know or care which types of tokens the parser is expecting right now. – sepp2k Sep 06 '18 at 16:15

1 Answers1

1

The error about "missing 'PROGRAM'" has been solved when you removed the KEYWORD rule (note that you should also remove the OPERATOR rule for the same reasons).

The error you're encountering now is completely unrelated.

Your current problem concerns the definition of empty, which you didn't show. You've said that you tried both EMPTY : $ ; and EMPTY : ^$ ; (and then presumably empty: EMPTY;), but none of those even compile, so they wouldn't cause the parse error you posted. Either way, the concept of an EMPTY token can't work. When would such a token be generated? Once between every other token? In that case, you'd get a lot of "unexpected EMPTY" errors. No, the whole point of an empty rule is that it should succeed without consuming any tokens.

To achieve that, you can just define empty : ; and remove EMPTY altogether. Alternatively you could remove empty as well and just use an empty alternative (i.e. | ;) wherever you're currently using empty. Either approach will make your code work, but there's a better way:

You're using empty as the base case for rules that basically amount to lists. ANTLR offers the repetition operators * (0 or more) , + (1 or more) as well as the ? operator to make things optional. These allow you to define lists non-recursively and without an empty rule. For example stmt_list could be defined like this:

stmt_list : stmt* ;

And id_list like this:

id_list : (id (',' id)*)? ;

On an unrelated note, your grammar can simplified greatly by making use of the fact that ANTLR 4 supports direct left recursion, so you can get rid of all the different expression rules and just have one that's left-recursive.

That'd give you:

expr : primary
     | id '(' expr_list ')'
     | expr mulop expr
     | expr addop expr
     ;

And the rules expr_prefix, factor, factor_prefix and postfix_expr and call_expr could all be removed.

sepp2k
  • 363,768
  • 54
  • 674
  • 675
  • I see. If so, what 's the point of having empty string? As you explained we can t avoid using it altogether. – Nht_e0 Sep 06 '18 at 18:15
  • In addition, now I get the following error: `error(119): Wengo.g4::: The following sets of rules are mutually left-recursive [expr, primary, expr_prefix, factor, factor_prefix, postfix_expr]` – Nht_e0 Sep 06 '18 at 18:16
  • @Nht_e0 If you use the `*` and `?` operators, there's no need for the `empty` rule and you can remove it. The errors about the mutual left recursion are about a bunch of places where you wrote `(` and `)` when you meant `'('` and `')'` respectively. I assumed that only happened when you posted the code here (i.e. your real code already had the quotes) because you didn't get that error message earlier. If you never had those quotes, you shouldn't have gotten as far as you did. – sepp2k Sep 06 '18 at 18:21
  • Sorry about that, I missed the ( and ). Now it works fine. – Nht_e0 Sep 06 '18 at 18:37