0

I want to use this regex (["'])(?:(?=(\?))\2.)*?\1 from this answer to this question

But when I use it in a lex input file as follows:

DOUBLEQUOTE_CONTENTS (["'])(?:(?=(\?))\2.)*?\1

%%

{DOUBLEQUOTE_CONTENTS} { printf("here"); }

I get a large number of "unrecognized character" errors from lex. It chokes on the first ? character and many more after that. If I escape the ? characters, the regex doesn't match anymore.

How can I use the said regex in a lex input file?

Community
  • 1
  • 1
Andrew S.
  • 101
  • 2

1 Answers1

1

(F)lex does not implement lookahead ((?=...) and friends) nor non-greedy repetitions (*?). And it doesn't have captures so non-capturing parentheses ((?:...)) are redundant. And finally, it does not implement back references (\2).

In short, you can only use regular expressions which are really regular. See the flex manual to see what is permitted.

Here's a simple pattern which doesn't depend on lookaheads or backreferences:

["]([^"\\]|\\.)*["]|'([^'\\]|\\.)*'
rici
  • 234,347
  • 28
  • 237
  • 341
  • The patter you provide is inefficient because of alternations. I suggest using the regex from my comment above. – Wiktor Stribiżew Dec 05 '16 at 22:22
  • @wiktor: in flex, alternations are free because flex creates a DFA. See this answer, which includes a benchmark: http://stackoverflow.com/a/26922380/1566221 – rici Dec 05 '16 at 22:35