ANTLR Grammer:
grammar Test;
/*
* Parser Rules
*/
rpg_file : f_section
EOF? ;
f_section : f_line* ;
f_line : f_start f_content f_end ;
f_start : F_START ;
f_content : F_CONTENT ;
f_end : ANYTHING_BUT_NL* NEWLINE;
/*
* Lexer Rules
*/
fragment F : ('F' | 'f') ;
fragment LOWERCASE : [a-z] ;
fragment UPPERCASE : [A-Z] ;
fragment DIGIT : [0-9] ;
fragment CHAR : (LOWERCASE | UPPERCASE) ;
fragment DIGIT_CHAR : (DIGIT | CHAR) ;
fragment WHITESPACE : (' ' | '\t') ;
fragment WS_OR_TEXT : (WHITESPACE | DIGIT_CHAR) ;
fragment F_FILENAME : WS_OR_TEXT?
WS_OR_TEXT?
WS_OR_TEXT?
WS_OR_TEXT?
WS_OR_TEXT?
WS_OR_TEXT?
WS_OR_TEXT? ;
/*
* First 5 characters in RPG format are unused and should be ignored
*/
fragment SPEC_START : {getCharPositionInLine() == 0}?
ANYTHING_BUT_NL
ANYTHING_BUT_NL
ANYTHING_BUT_NL
ANYTHING_BUT_NL
ANYTHING_BUT_NL ;
F_START : SPEC_START F ;
F_CONTENT : {getCharPositionInLine() == 6}? CHAR F_FILENAME ;
ANYTHING_BUT_NL : ~('\n' | '\r') ;
NEWLINE : ('\r'? '\n' | '\r') ;
ANTLR Input:
FFILENAMEIF E K DISK
FFILENA01UF E K DISK
In the RPG files that I am trying to parse, there are specifications that are found by looking at the character in column 6 of a line. If there is an 'F' in column 6, it means its a file description specification and the following 8 characters are the file name. I am trying to extract the filename in the f_content parser rule using the F_CONTENT lexer rule.
The expected output here is that the F_CONTENT lexer rule only matches when the sequence starts at position 6 (column 7) of the RPG file, thus extracting columns 7 - 14 which are reserved for filename. The actual output is that the filename is extracted, but then for the next 8 characters afterwards, it also matches to F_CONTENT, meaning the getCharPositionInLine() is not being applied as intended.
I also have getCharPositionInLine() in the SPEC_START lexer rule, but that isn't working either in other cases when full code is provided (it is matching to sequences that are in middle of line 30 characters in).