4

I'm trying to capture quoted strings without the quotes. I have this terminal

%token <string> STRING

and this production

constant:
    | QUOTE STRING QUOTE { String($2) }

along with these lexer rules

| '\''       { QUOTE }
| [^ '\'']*  { STRING (lexeme lexbuf) } //final regex before eof

It seems to be interpreting everything leading up to a QUOTE as a single lexeme, which doesn't parse. So maybe my problem is elsewhere in the grammar--not sure. Am I going about this the right way? It was parsing fine before I tried to exclude quotes from strings.

Update

I think there may be some ambiguity with the following lexer rules

let name = alpha (alpha | digit | '_')*
let identifier = name ('.' name)*

The following rule is prior to STRING

| identifier    { ID (lexeme lexbuf) }

Is there any way to disambiguate these without including quotes in the STRING regex?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Daniel
  • 47,404
  • 11
  • 101
  • 179
  • Why don't you want to include quotes in the `STRING` regex? The ambiguity you note between `ID` and `STRING` is deep, why would you want two similar terminals like that? – Stephen Swensen Nov 21 '11 at 21:22
  • Obviously quotes aren't part of string literals themselves, but stripping them out in the parser is easy enough. You're right that removing them from the regex introduces too much ambiguity. – Daniel Nov 21 '11 at 21:28

3 Answers3

6

It's pretty normal to do semantic analysis in the lexer for constants like strings and numeric literals, so you might consider a lex rule for your string constants like

| '\'' [^ '\'']* '\'' 
    { STRING (let s = lexeme lexbuf in s.Substring(1, s.Length - 2)) }
Stephen Swensen
  • 22,107
  • 9
  • 81
  • 136
  • This is equivalent to trimmming the quotes, no? – Daniel Nov 21 '11 at 21:34
  • Correct - the key difference from @Vitaliy's answer is that I'm doing the stripping in the lexer rather than the parser (which means if you want to use the `STRING` terminal other places in your parser, you don't have to trim it at each spot). – Stephen Swensen Nov 21 '11 at 21:36
  • Ah, yes...that's an important difference. Thanks. – Daniel Nov 21 '11 at 21:48
1

You can use lexeme with quotes, but trim quotes in parser

Lexer:

let constant       = ("'" ([^ '\''])* "'")
...
| constant         { STRING(lexeme lexbuf) } 

Parser:

%token <string> STRING

...
constant:
    | STRING { ($1).Trim([|'''|]) }

Or if you want to extract quotes from string:

Lexer:

let name = alpha (alpha | digit | '_')*
let identifier = name ('.' name)*
...

| '\''       { QUOTE }
| identifier { ID (lexeme lexbuf) }
| _          { STRING (lexeme lexbuf) } 

identifier will take away symbols from STRING, so your lexeme stream can be like: QUOTE ID STRING ID .. QUOTE, and you have to handle this in parser:

Parser:

constant:
     | QUOTE content QUOTE     { String($2) }

content:
     | ID content      { $1+$2 }
     | STRING content  { $1+$2 }
     | ID              { $1 }
     | STRING          { $1 }
Vitaliy
  • 2,744
  • 1
  • 24
  • 39
0

I had a similar problem. I capture them in the "lexic.l" file using states. Here my autoanswer

Community
  • 1
  • 1
jbondia
  • 253
  • 2
  • 7