2

I'm currrently building a date parser using antlr. The inputs it takes are

year monthName numDayOfMonth era
numDayOfMonth monthName year era   

These are all under the rule stringDate, so my grammar looks like this

stringDate: year monthName numDayOfMonth
|           numDayOfMonth monthName year; 

numYear:               NUMBER ;
strMonth:              MONTH  ;
numDayOfMonth:         NUMBER ;

NUMBER:    [0-9]+ ;
MONTH:     'jan' | 'feb' | 'mar' | 'apr' | 'jun' | 'jul' | 'aug' | 'sep' | 'sept' | 'oct' | 'nov' | 'dec' ;

In my listeners, I check to make sure that numDayOfMonth is within the range [1, 31] to make sure that that the number is a valid date. I do the same for the months (first I transform them into their corresponding month).

The problem is, if it input the date 2013 June 13, The date gets parsed correctly. However, when I input 13 June 2013, it gets parsed incorrectly because the parser gets confused and thinks 2013 is a day, not a year, and therefore the check fails during exitNumDayOfMonth. I've been scratching my head about how to handle this. I essentially want the evaluator to skip the rule of i encounter a num > 31, but I'm not entirely sure of how to skip a rule. I have tried returning, and throwing errors, but nothing seems to work.

Is there a way to make the evaluator skip this rule and go on to the alternative instead?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
ceez
  • 35
  • 4
  • doesn't antlr have a date class? In any case if your year is always 4 digits, parse the string and find that first. Then find the non-numeric month and extract that. You'll then be left with the day. Also you could check stack overflow. There are already a few threads on parsing dates in antlr. Here's one: https://stackoverflow.com/questions/35651518/how-to-create-a-antlr4-grammar-which-will-parse-date – John Lord Oct 17 '18 at 00:07

1 Answers1

2

Why don’t you change the token definition of year to contain only 4 digits? That will solve the issue.

So, your year and date will be

numYear: [0-9] [0-9] [0-9] [0-9]
  numDayOfMonth: [0-9] | [0-9] [0-9]

Currently, they both have same definition - so the parser does not know which rule to pick while parsing and goes with the first one which fits the input.

mettleap
  • 1,390
  • 8
  • 17