0

The source request rules described using *.g antlr files.

I'm trying to generate java code using antlr4 and getting errors like:

error(50): mql2.g4:9:7: syntax error: mismatched input ';' expecting RBRACE
error(50): mql2.g4:10:6: syntax error: mismatched input ';' expecting COLON while matching a lexer rule
error(50): mql2.g4:11:11: syntax error: mismatched input ';' expecting COLON while matching a lexer rule
error(50): mql2.g4:12:10: syntax error: mismatched input ';' expecting COLON while matching a lexer rule
error(50): mql2.g4:16:16: syntax error: '{package com.proquest.mql.queryTranslator;}' came as a complete surprise to me while matching rule preamble
error(50): mql2.g4:17:1: syntax error: 'lexer' came as a complete surprise to me while looking for an identifier
error(50): mql2.g4:19:11: syntax error: '^' came as a complete surprise to me
error(50): mql2.g4:19:16: syntax error: '!' came as a complete surprise to me
...

the input file example is

grammar mql2;

options {
    output=AST;
    k=2;
}

tokens {
    AND_OP;
    OR_OP;
    FIELD_CODE;
    FC_SUFFIX;
    }


@parser::header {package com.company.mql.queryTranslator;}
@lexer::header {package com.company.mql.queryTranslator;}

parse   : mql^ EOF!
    ;

mql : WS!* mqlx WS!* ( and_or^ WS! mqlx WS!*)*;

and_or
    : and_operator
    | or_operator
    ;

mqlx : search_item
     | LPAREN! mql^ RPAREN!
     | field_code field_phrase RPAREN -> ^(FIELD_CODE field_code field_phrase)
     | field_code_prefix field_code_suffix ->^(FIELD_CODE field_code_prefix field_code_suffix)
     ;

field_code
    : w=WORD^ LPAREN!
    ;

field_phrase
    : (WS!* (WORD|PHRASE|AND|OR))+
    ;

field_code_prefix
    : WORD^ '.'!;

field_code_suffix
    : field_code  (WORD|PHRASE) RPAREN!;

and_operator
    : AND->AND_OP | (/*empty*/->AND_OP) ;

or_operator
    : OR->OR_OP;

search_item
        :  NOT^ WS!* mqlx
    |  (WORD|PHRASE);

LPAREN : '(';

RPAREN : ')';

AND : ('a'|'A')('n'|'N')('d'|'D');

NOT : ('n'|'N')('o'|'O')('t'|'T');

OR : ('o'|'O')('r'|'R');

fragment
DIGIT  : ('0'..'9') ;

fragment
LETTER  : ('a'..'z' | 'A'..'Z'| 'á'| '*' | '&' | '-' | '.' | ',' | '?' | '!' | '/' | '\u0080'..'\ufffe');

SPECIAL_CHAR : ('\'' | '&');

WORD    : (LETTER|DIGIT|SPECIAL_CHAR)+;

WS : ( '\t' | ' ' | '\r' | '\n' | '|')+ /*{ $channel = HIDDEN; }*/;

fragment
QUOTE :   '"' ;

PHRASE  :   QUOTE (options {greedy=false;} : . )* QUOTE ;

So the questions are:

  • Which version of antlr the file should be supported by? (I started reading antlr4 reference, and continued with antlr3 on their confluence but not realized yet the current version)
  • How to fix for antlr4 errors like a syntax error: '^' came as a complete surprise to me or syntax error: '!' came as a complete surprise to me?
Sergii
  • 7,044
  • 14
  • 58
  • 116
  • `^` and `!` were used by ANTLR3 for AST(Abstract syntax tree) creation. Token labeled with `^` becomes a root of parser generated tree. Token labeled with `!` is excluded from generated tree. – ibre5041 May 15 '23 at 15:49
  • @ibre5041, so the first question is answered. Is there any substitution in ANTLR4 for `^` and `!`? – Sergii May 15 '23 at 15:55
  • 1
    t is huge topic. See https://stackoverflow.com/questions/29971097/how-to-create-ast-with-antlr4 https://stackoverflow.com/questions/29971097/how-to-create-ast-with-antlr4 You have to write your own visitor to generate AST. – ibre5041 May 15 '23 at 16:05
  • @ibre5041, i'm not sure that i've got an idea but thanks for your help. I'm going to figure out how it can be used. Thanks one more time. – Sergii May 15 '23 at 16:23

1 Answers1

2

As mentioned in the comments: the grammar is for ANTLR3. I recommend stop using v3 grammars: it's rather old. Converting it into a v4 grammar is easy:

grammar mql2;

@parser::header {package com.company.mql.queryTranslator;}
@lexer::header {package com.company.mql.queryTranslator;}

parse   : mql EOF
    ;

mql : mqlx (and_or mqlx)*;

and_or
    : and_operator
    | or_operator
    ;

mqlx : search_item
     | LPAREN mql RPAREN
     | field_code field_phrase RPAREN
     | field_code_prefix field_code_suffix
     ;

field_code
    : WORD LPAREN
    ;

field_phrase
    : (WORD|PHRASE|AND|OR)+
    ;

field_code_prefix
    : WORD '.';

field_code_suffix
    : field_code  (WORD|PHRASE) RPAREN;

and_operator
    : AND;

or_operator
    : OR;

search_item
        :  NOT mqlx
    |  (WORD|PHRASE);

LPAREN : '(';

RPAREN : ')';

AND : [aA] [nN] [dD];

NOT : [nN] [oO] [tT];

OR : [oO] [rR];

fragment
DIGIT  : [0-9];

fragment
LETTER  : [a-zA-Zá*&\-.,?!/\u0080-\ufffe];

fragment
SPECIAL_CHAR : ('\'' | '&');

WORD    : (LETTER|DIGIT|SPECIAL_CHAR)+;

WS : [\t \r\n|]+ -> channel(HIDDEN);

fragment
QUOTE :   '"' ;

PHRASE  :   QUOTE .*? QUOTE ;

Version 4 just gives you a parse tree which you cannot transform (into an AST) in the grammar as was possible in version 3.

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288