0

This question is related to RegEx: Grabbing values between quotation marks

The RegEx from the best answer

(["'])(?:(?=(\\?))\2.)*?\1

tested with the

Debuggex Demo

also matches strings that start with an escaped double quote. I tried to extend the definition to work with a negativ lookbehind.

(["'](?<!\\))(?:(?=(\\?))\2.)*?\1

Debuggex Demo

but this does not change anything in the matched pattern. Any suggestions on how to exclude escaped singe / double quotes as a starting pattern?

I want to use this as a highlighting pattern in nedit, which supports regex-lookbehind.

example for desired matching:

<p>
  <span style="color: #ff0000">"str1"</span> notstr
  <span style="color: #ff0000">"str2"</span>
  \"notstr <span style="color: #ff0000">"str4"</span>
</p>
Community
  • 1
  • 1
j-hap
  • 150
  • 1
  • 2
  • 9
  • 1
    These kinds of input strings should never occur. The regex can look like `(?<!\\)(?:\\\\)*(["'])(?:(?=(\\?))\2.)*?\1`, but you will have to adjust it further. – Wiktor Stribiżew Mar 28 '17 at 12:08
  • thanks a lot, my rookie mistake regarding the lookobehind position i guess... – j-hap Mar 28 '17 at 12:19

1 Answers1

1

Using negative lookbehind for the backslash not preceded by another backslash, i.e.

(?<!(?<!\\)\\)["']

solves the problem:

((?<!(?<!\\)\\)["'])(?:(?=(\\?))\2.)*?(?<!(?<!\\)\\)\1

Demo.

You should be very careful about this approach, because generally regex is not a good tool for parsing inputs in markup syntax. You would be better off using a full-scale parser, and then optionally applying regex to parts that you get back from it.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • 1
    No, it does not actually solve the problem because it will fail a valid match if a quote is preceded with an escaped backslash representing a literal backslash. – Wiktor Stribiżew Mar 28 '17 at 12:20
  • 1
    @WiktorStribiżew You are right, I didn't think of escaping the escape character itself. Surprisingly to me, applying a negative look-behind inside a negative look-behind was allowed to patch for single occurrences of escaped backslashes. Of course now it will fail for escaped backslashes followed by non-escaped backslashes, but regex's inability to count puts a limit to how far this whole thing can be pushed. – Sergey Kalinichenko Mar 28 '17 at 12:31
  • You cannot use lookbehinds here like that. You will fail another valid match. The only "valid" (to some extent) approach is the one I mentioned in the comment. – Wiktor Stribiżew Mar 28 '17 at 12:33
  • is it necessary to repeat the lookbehind for the last "\1" or does the backreference actually contain the lookbehind-behaviour? Either way in this special case it's not necessary because the middle part takes care of quotes preceded by a backslash after the starting quote – j-hap Mar 28 '17 at 12:41