0

I've recently been tasked with writing an ANTLR3 grammar for a fictional language. Everything else seems fine, but I've a couple of minor issues which I could do with some help with:

1) Comments are between '/*' and '*/', and may not be nested. I know how to implement comments themselves ('/*' .* '*/'), but how would I go about disallowing their nesting?

2) String literals are defined as any sequence of characters (except for double quotes and new lines) in between a pair of double quotes. They can only be used in an output statement. I attempted to define this thus:

output : OUTPUT (STRINGLIT | IDENT) ;
STRINGLIT : '"' ~('\r' | '\n' | '"')* '"' ;

For some reason, however, the parser accepts

OUTPUT "Hello,
World!"

and tokenises it as "Hello, \nWorld. Where the exclamation mark or closing " went I have no idea. Something to do with whitespace maybe?

WHITESPACE : ( '\t' | ' ' | '\n' | '\r' | '\f' )+ { $channel = HIDDEN; } ;

Any advice would be much appreciated - thanks for your time! :)

Sam Harwell
  • 97,721
  • 20
  • 209
  • 280

1 Answers1

0
  1. The form you wrote already disallows nested comments. The token will stop at the first instance of */, even if multiple /* sequences appeared in the comment. To allow nested comments you have to write a lexer rule to specifically treat the nesting.

  2. The problem here is STRINGLIT does not allow a string to be split across multiple lines. Without seeing the rest of your lexer rules, I cannot tell you how this will be tokenized, but it's clear from the STRINGLIT rule you gave that the sample input is not a valid string.

NOTE: Your input given in the original question was not clear, so I reformatted it in an attempt to show the exact input you were using. Can you verify that my edit properly represents the input?

Sam Harwell
  • 97,721
  • 20
  • 209
  • 280
  • Thanks so much 280Z28 - you're entirely right about the non-nested nature, I can't believe I didn't spot that! Your edit does properly represent the input - thank you! I need to clarify point 2: a STRINGLIT should NOT be allowed to be split across multiple lines, but using the syntax I have defined, the interpreter is allowing me to do so, with the output displayed above. I'm not sure at all why this is the case! – Stevie Ponder Apr 25 '13 at 16:33
  • The ANTLR 3 *interpreter* frequently produces incorrect results. Have you tried running the actual parser instead? – Sam Harwell Apr 25 '13 at 16:43