2
/^"((?:[^"]|\\.)*)"/

Against this string:

"quote\_with\\escaped\"characters" more

It only matches until the \", although I've clearly defined \ as an escape character (and it matches \_ and \\ fine...).

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Core Xii
  • 6,270
  • 4
  • 31
  • 42

3 Answers3

4

It works correctly if you flip the order of your two alternatives:

/^"((?:\\.|[^"])*)"/

The problem is that otherwise the important \ character gets eaten up before it tries matching \". It worked before for \\ and \_ only because both characters in either pair get matched by your [^"].

VoteyDisciple
  • 37,319
  • 5
  • 97
  • 97
0

Using Python with raw-string literals to ensure no further interpretation of escape sequences is taking place, the following variant does work:

import re

x = re.compile(r'^"((?:[^"\\]|\\.)*)"')

s = r'"quote\_with\\escaped\"characters" more"'

mo = x.match(s)
print mo.group()

emits "quote\_with\\escaped\"characters"; I believe that in your version (which also interrupts the match precociously if substituted in here) the "not a doublequote" subexpression ([^"]) is swallowing the backslashes that you intend to be taken as escaping the immediately-following characters. All I'm doing here is ensuring that such backslashes are NOT swallowed in this way, and, as I said, it seems to work with this change.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
0

Not intend to confuse, just another information I've played around with. Below regexp(PCRE) try to not match wrong syntax (eg. end with \") and can use with both ' or "

/('|").*\\\1.*?[^\\]\1/

to use with php

<?php if (preg_match('/(\'|").*\\\\\1.*?[^\\\\]\1/', $subject)) return true; ?>

For:

"quote\_with\\escaped\"characters"  "aaa"
'just \'another\' quote "example\"'
"Wrong syntax \"
"No escapes, no match here"

This only match:

"quote\_with\\escaped\"characters" and
'just \'another\' quote "example\"'
noomz
  • 1,955
  • 4
  • 18
  • 20