ANTLR4 grammar don't recognize 20 as NUMBER

Question

I have created following grammar(Please, see it bellow) And when I parse following string

"Schedule;cron(\"*/3 * * * * America/New_York\");'TestFile'yyyy-M-dd-HH-mm;America/New_York;20"

haven an error message

"line 1:89: mismatched input '20' expecting NUMBER"

. It's strange, as in my point of view NUMBER:[0-9]+; will allow "20". Where I'm wrong?

Regards, Vladimir

    lexer grammar FileTriggerLexer;

@header { 
 }

STEP
:
    '/' INTEGER
;

SCHEDULE
:
    'Schedule'
;

SEMICOLON
:
    ';'
;

ASTERISK
:
    '*'
;

CRON
:
    'cron'
;

MARKET_CRON
:
    'marketCron'
;

COMBINED
:
    'combined'
;

FILE_FEED
:
    'FileFeed'
;

LBRACKET
:
    '('
;

RBRACKET
:
    ')'
;

PERCENT
:
    '%'
;

INTEGER
:
    [0-9]+
;

MINUTES_INTERVAL
:
    [1-59]
;

HOURS_INTERVAL
:
    [0-23]
;

WEEK_DAYS_INTERVAL
:
    [1-7]
;

MONTH_INTERVAL
:
    [1-12]
;

DAYS_OF_MONTH_INTERVAL
:
    [1-31]
;

DASH
:
    '-'
;

NUMBER
:
    [0-9]+
;



DOUBLE_QUOTE
:
    '"'
;

QUOTE
:
    '\''
;

SLASH
:
    '/'
;

DOT
:
    '.'
;

COMMA
:
    ','
;

UNDERSCORE
:
    '_'
;

ID
:
    [a-zA-Z] [a-zA-Z0-9]*
; 



REGEX
:
    (
        ID
        | DOT
        | ASTERISK
        | NUMBER
        |PERCENT
    )+
;

WS
:
    [ \t\r\n]+ -> skip
; // skip spaces, tabs, newlines

/**
 * Define a grammar called Hello
 */
grammar FileTriggerValidator;

options
   {
    tokenVocab = FileTriggerLexer;
}

r
:
    (schedule
    | file_feed)+
;

expression
:
    schedule
    | file_feed
;

file_feed
:
    file_feed_name SEMICOLON source_file SEMICOLON source_host SEMICOLON
    source_host SEMICOLON regEx SEMICOLON regEx
    (
        SEMICOLON source_host
    )*
;

formatString
:
    source_host
    (
        '%' source_host?
    )* DOT source_host
;

regEx
:
    REGEX
;

source_host
:
    ID
    (
        DASH ID
    )*
;

file_feed_name
:
    FILE_FEED
;

source_file
:
    (
        ID
        | DASH
        | UNDERSCORE
    )+
;

schedule
:
    SCHEDULE SEMICOLON schedule_defining SEMICOLON file_name SEMICOLON timezone

    (
        SEMICOLON NUMBER
    )?
;

schedule_defining
:
    cron
    | market_cron
    | combined_cron
;

cron
:
    CRON LBRACKET DOUBLE_QUOTE cron_part timezone DOUBLE_QUOTE RBRACKET
;

market_cron
:
    MARKET_CRON LBRACKET DOUBLE_QUOTE cron_part timezone DOUBLE_QUOTE COMMA
    DOUBLE_QUOTE ID DOUBLE_QUOTE RBRACKET
;

combined_cron
:
    COMBINED LBRACKET cron_list_element
    (
        COMMA cron_list_element
    )* RBRACKET
;

mic_defining
:
    ID
;

file_name
:
    (
        ID
        | DOT
        | QUOTE
        | DASH
    )+
;

cron_list_element
:
    cron
    | market_cron
;
//

schedule_defined_string
:
    cron
;
// 

cron_part
:
    minutes hours days_of_month month week_days
;
//

minutes
:
    MINUTES_INTERVAL
    | with_step_value
;
//

hours
:
    HOURS_INTERVAL
    | with_step_value
;
//

int_list
:
    INTEGER
    (
        COMMA INTEGER
    )*
;

interval
:
    INTEGER DASH INTEGER
;
//

days_of_month
:
    DAYS_OF_MONTH_INTERVAL
    | with_step_value
;
//

month
:
    MONTH_INTERVAL
    | with_step_value
;
//

week_days
:
    WEEK_DAYS_INTERVAL
    | with_step_value
;
//

timezone
:
    timezone_part
    (
        SLASH timezone_part
    )?
;
//

timezone_part
:
    ID
    (
        UNDERSCORE ID
    )?
;
//

with_step_value
:
    (
        int_list
        | interval
        | ASTERISK
    ) STEP?
;
//

Lucas Trzesniewski · Answer 1 · 2017-09-17T19:26:25.353

You basically made the #1 lexer mistake.

Lexer rules have defined priority rules, and in your case the INTEGER rule takes priority over NUMBER. Both have the same definition, therefore you can simply replace all NUMBER occurences with INTEGER anyway.

Note that your *_INTERVAL rules definitions don't mean what you think. For instance DAYS_OF_MONTH_INTERVAL (defined as [1-31]) will match one digit in the range 1-3, or 1, which means it will match either of 1, 2 or 3, and nothing else. And it is shadowed by the INTEGER rule, just like your NUMBER rule.

Drop all these *_INTERVAL rules, and keep only the INTEGER rule. Remember that lexing is an independent pass, and the parser has no influence on it. Don't try to validate your cron expressions within the grammar, you'll have a very tough time. First, parse your file, and then perform a separate validation pass over the result.

ANTLR4 grammar don't recognize 20 as NUMBER

1 Answers1