2

ANLTR 4:

I need to support a single quoted string literal with escaped characters AND the ability to use double curly braces as an 'escape sequence' that will need additional parsing. So both of these examples need to be supported. I'm not so worried about the second example because that seems trivial if I can get the first to work and not match double curly brace characters.

1. 'this is a string literal with an escaped\' character' 2. 'this is a string {{functionName(x)}} literal with double curlies'

StringLiteral 
: '\'' (ESC | AnyExceptDblCurlies)*? '\'' ;

fragment 
ESC : '\\' [btnr\'\\];

fragment 
AnyExceptDblCurlies 
: '{' ~'{' 
| ~'{' .;

I've done a lot of research on this and understand that you can't negate multiple characters, and have even seen a similar approach work in Bart's answer in this post...

Negating inside lexer- and parser rules

But what I'm seeing is that in example 1 above, the escaped single quote is not being recognized and I get a parser error that it cannot match ' character'.

if I alter the string literal token rule to the following it works...

StringLiteral 
: '\'' (ESC | .)*? '\'' ;

Any ideas how to handle this scenario better? I can deduce that the escaped character is getting matched by AnyExceptDblCurlies instead of ESC, but I'm not sure how to solve this problem.

Community
  • 1
  • 1
  • do you really need to tokenize the content of a string literal at this stage? you don't tell what kind of grammar use case you have; i'm thinking of languages like C or C# which usually leave parsing of literals to runtime functions, printf, String.Format and the like – Cee McSharpface Feb 22 '17 at 23:17
  • @dlatikay, I need to be able to parse the case where the literal contains '{{x}}', so I can't defer until runtime. Are you suggesting it may be easier to handle this case on the parser rule level? – ichrisnichols Feb 22 '17 at 23:28
  • I see... yes, parser rule > it reminds me of [this one](http://stackoverflow.com/questions/1850468/parsing-string-interpolation-in-antlr) – Cee McSharpface Feb 23 '17 at 11:17

1 Answers1

1

To parse the template definition out of the string pretty much requires handling in the parser. Use lexer modes to distinguish between string characters and the template name.

Parser:

options {
    tokenVocab = TesterLexer ;
}

test : string EOF ;
string   : STRBEG ( SCHAR | template )* STREND ; // allow multiple templates per string
template : TMPLBEG TMPLNAME TMPLEND ;

Lexer:

STRBEG : Squote -> pushMode(strMode) ;

mode strMode ;
    STRESQ  : Esqote  -> type(SCHAR) ; // predeclare SCHAR in tokens block
    STREND  : Squote  -> popMode ;
    TMPLBEG : DBrOpen -> pushMode(tmplMode) ;
    STRCHAR : .       -> type(SCHAR) ;

mode tmplMode ;
    TMPLEND  : DBrClose  -> popMode ;
    TMPLNAME : ~'}'*  ;

fragment Squote : '\''   ;
fragment Esqote : '\\\'' ;
fragment DBrOpen   : '{{' ;
fragment DBrClose  : '}}' ;

Updated to correct the TMPLNAME rule, add main rule and options block.

GRosenberg
  • 5,843
  • 2
  • 19
  • 23
  • this looks like exactly what I need...I keep hitting a brick wall trying to use semantic predicates, but using a mode stack looks to be the ticket! Let me try this out...can't believe this scenario requires such a level of articulation. – ichrisnichols Feb 23 '17 at 03:06
  • GRosenberg, I cannot get even the most simple versions of your example to compile in ANTLR, targeting C# (.NET 4.5.2)...I corrected a few bugs in your sample, but even still I can't seem to match even a simple single quoted string...I will display my code in an update on the main thread. Please let me know if I'm missing something. – ichrisnichols Feb 24 '17 at 00:53
  • Appears the parser is missing the options block:`options { tokenVocab = TesterLexer ; }` This is a standard requirement of all split grammars. – GRosenberg Feb 24 '17 at 04:04
  • Absolutely, was wanting to get this incorporated into my larger grammar first, but you are right, the question was answer completely. Thanks again! – ichrisnichols Feb 25 '17 at 18:35
  • One other comment, @GRosenberg, I think the mode names need to start with a capital letter. – ichrisnichols Feb 25 '17 at 18:37