0

I am writing grammar for COBOL language, and I make a rule to identify the words in COBOL. My identifier rule is

IDENTIFIER : [a-zA-Z0-9]+ ([-_]+ [a-zA-Z0-9]+)*;

it working fine for my most of the cases but when I test on the following input

0000-MAIN-ROUTINE

then it not working. Please share your valuable thoughts, for makeing me correct. How can I solve this issue.

Travis Webb
  • 14,688
  • 7
  • 55
  • 109
Siddharth
  • 197
  • 1
  • 14
  • But its working fine in my other grammar.:( – Siddharth Sep 23 '15 at 08:03
  • In regexp as I know them, you would need to escape the dash character: `IDENTIFIER : [a-zA-Z0-9]+ ([\-_]+ [a-zA-Z0-9]+)*;` nevermind, seems like I am one of those folks that escape everything because I don't know how it works :) For better answers, I think you should post your grammar file. – A wild elephant Sep 23 '15 at 08:59
  • Likely that the dash is being treated as a range operator. Placing it last in the set: `[_-]` will force treatment as an ordinary character. – GRosenberg Sep 24 '15 at 20:29

1 Answers1

0

According to Regex - Should hyphens be escaped?, the hyphen should be treated as a character instead of range operator if it is either first or last. That might not apply to ANTLR4's regex-like lexer token definitions.

Separately, there are a couple of problems with your proposed definition of a COBOL word

IDENTIFIER : [a-zA-Z0-9]+ ([-_]+ [a-zA-Z0-9]+)*;

A COBOL word has the following rules:

  • composed of the characters [A-Za-z0-9_-]
  • may not start or end with a - dash
  • may not start with an _ underscore
  • must contain at least one upper or lower case alpha [A-Za-z]

I see two problems with the proposed definition above

  1. does not allow an underscore as the final character
  2. does not require an alpha character. For example, the above definition allows all digits.

I suggest the following ANTLR4 lexer definition for a COBOL word:

IDENTIFIER : ([0-9][0-9_-])? [A-Za-z] ([A-Za-z0-9_-][A-Za-z0-9_])? ;

// IBM Enterprise COBOL Language Reference V4.2
// Enterprise COBOL for z/OS
// Language Reference
// Version 4 Release 2
// SC23-8528-01
// Second Edition (August 2009)
// Page 9
// PDF page 31
Community
  • 1
  • 1
Michael Howard
  • 193
  • 1
  • 8
  • Just a note: I tried the suggested identifier in Antlr4, and it isn't even close. I suspect the author was "winging it". – batpox Mar 23 '18 at 21:32