Is it possible to keep track of precedent Tokens to resolve ambiguities in ANTLR4?

Question

I'm starting in ANTLR4, what I would want is to recognize this format while doing some action according to the Token read. what I'm trying to produce:

IDENTIFIER:Test1 ([a-zA-Z09]{10})

{insert 'Test1' in personId column}
CODE: F0101F
FULL_NAME: FIRST_NAME ( [A-Z]+)LAST_NAME ( [A-Z]+ )

{insert FIRST_NAME.value in firstName column and insert LAST_NAME.value in lastName column}
ADRESS: DIGIT+ STREET_NAME ([A-Z]+)

{insert STREET_NAME.value in streetName column }
OTHER_INFORMATION: ([A-Z]+)

{insert OTHER_INFORMATION.value in other column}

What I did:

prod
:
    read_information+
;

read_information
:
    {getCurrentToken().getType()== ID }?

    idElement
    |
    {getCurrentToken().getType()== CODE }?

    codeElement
    |
    {getCurrentToken().getType()== FULLNAME}?

    fullNameElement
    |
    {getCurrentToken().getType()== STREET}?

    streetElement
    |
    {getCurrentToken().getType()== OTHER}?

    otherElement
;

codeElement
:
    CODE
    {getCurrentToken().getText().matches("[A-F0-9]{6}")}?
    codeInformation
    |
    {/*throw someException*/}
;

codeInformation
:
    HEXCODE
;

HEXCODE
:
    [a-fA-F0-9]+
;

CODE
:
    'CODE:'
;

otherElement
:
    OTHER otherInformation
;

otherInformation
:
    STR
;

OTHER
:
    'OTHER:'
;

streetElement
:
    STREET streetInformation
;

STREET
:
    'STREET:'
;

streetInformation
:
    STR
;

STR
:
    [a-zA-Z0-9]+
;

WORD
:
    [a-zA-Z]+
;

fullNameElement
:
    FULLNAME firstNameInformation lastNameInformation
;

FULLNAME
:
    'FULL_NAME:'
;

firstNameInformation
:
    WORD
;

lastNameInformation
:
    WORD
;

idElement
:
    ID idInformation
;

ID
:
    'ID:'
;

idInformation
:
    {getCurrentToken().getText().length()<=10}?

    STR
;

I'm not sure If this is the right approach since I have problems reading WORD token. Since all the tokens are basically of the same format, I'm trying to find a way to keep track of the precedent token or context to resolve the ambiguity, and check the format at the same time ( example if it's more than 10 char throw exception )

score 0 · Answer 1 · edited May 23 '17 at 12:07

A thing you could do to find out which rules the generated parser would enter (i.e. which context is visited), you could use ANTLR to create visitors. There is a great explanation of it here (See Bart Kiers response).

Generally, if there are two rules, which are the same, you could just merge them into one, and then label the usage of them. For example, for these rules:

firstNameInformation
:
    WORD
;

lastNameInformation
:
    WORD
;

there is no reason to actually have them. Instead, you could write the grammar for the full name this way:

fullNameElement
:
    FULLNAME firstname=WORD lastname=WORD
;

In that case, you only use the WORD token, but you label them so you can distinct between them when doing a tree walk.

Is it possible to keep track of precedent Tokens to resolve ambiguities in ANTLR4?

1 Answers1