ANTLR4 getCharPositionInLine() not working as intended in LEXER rule

Question

ANTLR Grammer:

grammar Test;

/*
 * Parser Rules
 */
rpg_file                    : f_section
                              EOF? ;

f_section                   : f_line* ;

f_line                      : f_start f_content f_end ;

f_start                     : F_START ;
f_content                   : F_CONTENT ;
f_end                       : ANYTHING_BUT_NL* NEWLINE;

/*
 * Lexer Rules
 */
fragment F                  : ('F' | 'f') ;
fragment LOWERCASE          : [a-z] ;
fragment UPPERCASE          : [A-Z] ;
fragment DIGIT              : [0-9] ;
fragment CHAR               : (LOWERCASE | UPPERCASE) ;
fragment DIGIT_CHAR         : (DIGIT | CHAR) ;

fragment WHITESPACE         : (' ' | '\t') ;
fragment WS_OR_TEXT         : (WHITESPACE | DIGIT_CHAR) ;

fragment F_FILENAME         : WS_OR_TEXT?
                              WS_OR_TEXT?
                              WS_OR_TEXT?
                              WS_OR_TEXT?
                              WS_OR_TEXT?
                              WS_OR_TEXT?
                              WS_OR_TEXT? ;

/*
 * First 5 characters in RPG format are unused and should be ignored
 */
fragment SPEC_START        : {getCharPositionInLine() == 0}?
                             ANYTHING_BUT_NL
                             ANYTHING_BUT_NL
                             ANYTHING_BUT_NL
                             ANYTHING_BUT_NL
                             ANYTHING_BUT_NL ;

F_START                     : SPEC_START F ;
F_CONTENT                   : {getCharPositionInLine() == 6}? CHAR F_FILENAME ;

ANYTHING_BUT_NL             : ~('\n' | '\r') ;
NEWLINE                     : ('\r'? '\n' | '\r') ;

ANTLR Input:

     FFILENAMEIF  E           K        DISK
     FFILENA01UF  E           K        DISK

In the RPG files that I am trying to parse, there are specifications that are found by looking at the character in column 6 of a line. If there is an 'F' in column 6, it means its a file description specification and the following 8 characters are the file name. I am trying to extract the filename in the f_content parser rule using the F_CONTENT lexer rule.

The expected output here is that the F_CONTENT lexer rule only matches when the sequence starts at position 6 (column 7) of the RPG file, thus extracting columns 7 - 14 which are reserved for filename. The actual output is that the filename is extracted, but then for the next 8 characters afterwards, it also matches to F_CONTENT, meaning the getCharPositionInLine() is not being applied as intended.

I also have getCharPositionInLine() in the SPEC_START lexer rule, but that isn't working either in other cases when full code is provided (it is matching to sequences that are in middle of line 30 characters in).

For inspiration have a look at the ANTLR4 based rpgleparser on github. https://github.com/rpgleparser/rpgleparser — Christoff Erasmus, Jun 08 '19 at 20:00

ANTLR4 getCharPositionInLine() not working as intended in LEXER rule

0 Answers0