1

I have some data required to be parsed. I am using ANTLR4 tool to auto generate java parsers and lexers, that I can use to form a structured data from the input data given below Grammar:

grammar SUBDATA;
subdata:
    data+;
data:
    array;
array:
    '[' obj (',' obj)* ']';
intarray:
    '[' number (',' number)* ']';
number:
    INT;
obj:
    '{' pair (',' pair)* '}';
pair:
    key '=' value;
key:
    WORD;
value:
    INT | WORD | intarray;
WORD:
    [A-Za-z0-9]+;
INT:
    [0-9]+;
WS:
    [ \t\n\r]+ -> skip;

Test Input Data:

[
    {OmedaDemographicType=1, OmedaDemographicId=100, OmedaDemographicValue=4}, 
    {OmedaDemographicType=1, OmedaDemographicId=101, OmedaDemographicValue=26}, 
    {
        OmedaDemographicType=2, OmedaDemographicId=102, OmedaDemographicValue=[16,34]
    }
]

Ouput:

line 5:79 mismatched input '16' expecting INT
line 5:82 mismatched input '34' expecting INT

GUI Tree O/P

Parser is failing although I have the integer value at the above expected position.

zafar
  • 1,965
  • 1
  • 15
  • 14

1 Answers1

2

You've made the classic mistake of not ordering your lexer rules properly. You should read and understand the priority rules and their consequences.

In your case, INT will never be able to match since the WORD rule can match everything the INT rule can, and it's defined first in the grammar. These 16 and 32 from the example are WORDs.

You should remove the ambiguity by not allowing a word to start with a digit:

WORD:
    [A-Za-z] [A-Za-z0-9]*;
INT:
    [0-9]+;

Or by swapping the order of the rules:

INT:
    [0-9]+;
WORD:
    [A-Za-z0-9]+;

In this case, you can't have words that are fully numeric, but they will still be able to start with a number.

Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
  • Thanks for the answer. I thought it is the order of lexor tokens within the parser rule which matters the match. I thought more in the regex way ignoring the fact that, first lexical tokens are generated before matching the parser rule itself – zafar Sep 22 '16 at 17:17
  • In addition to Lucas' answer: if you see unexpected behavior in parsing always start solving them by looking at the tokens the lexer produced. This would have shown you that the token sequence was different from by you expected. – Mike Lischke Sep 23 '16 at 06:43