ANTLR grammar Identifier for COBOL

Question

I am writing grammar for COBOL language, and I make a rule to identify the words in COBOL. My identifier rule is

IDENTIFIER : [a-zA-Z0-9]+ ([-_]+ [a-zA-Z0-9]+)*;

it working fine for my most of the cases but when I test on the following input

0000-MAIN-ROUTINE

then it not working. Please share your valuable thoughts, for makeing me correct. How can I solve this issue.

In regexp as I know them, you would need to escape the dash character: `IDENTIFIER : [a-zA-Z0-9]+ ([\-_]+ [a-zA-Z0-9]+)*;` nevermind, seems like I am one of those folks that escape everything because I don't know how it works :) For better answers, I think you should post your grammar file. — A wild elephant, Sep 23 '15 at 08:59
Likely that the dash is being treated as a range operator. Placing it last in the set: `[_-]` will force treatment as an ordinary character. — GRosenberg, Sep 24 '15 at 20:29

score 0 · Answer 1 · edited May 23 '17 at 12:22

According to Regex - Should hyphens be escaped?, the hyphen should be treated as a character instead of range operator if it is either first or last. That might not apply to ANTLR4's regex-like lexer token definitions.

Separately, there are a couple of problems with your proposed definition of a COBOL word

IDENTIFIER : [a-zA-Z0-9]+ ([-_]+ [a-zA-Z0-9]+)*;

A COBOL word has the following rules:

composed of the characters [A-Za-z0-9_-]
may not start or end with a - dash
may not start with an _ underscore
must contain at least one upper or lower case alpha [A-Za-z]

I see two problems with the proposed definition above

does not allow an underscore as the final character
does not require an alpha character. For example, the above definition allows all digits.

I suggest the following ANTLR4 lexer definition for a COBOL word:

IDENTIFIER : ([0-9][0-9_-])? [A-Za-z] ([A-Za-z0-9_-][A-Za-z0-9_])? ;

// IBM Enterprise COBOL Language Reference V4.2
// Enterprise COBOL for z/OS
// Language Reference
// Version 4 Release 2
// SC23-8528-01
// Second Edition (August 2009)
// Page 9
// PDF page 31

Just a note: I tried the suggested identifier in Antlr4, and it isn't even close. I suspect the author was "winging it". — batpox, Mar 23 '18 at 21:32

ANTLR grammar Identifier for COBOL

1 Answers1