1

The following regex matches substrings inside quotation marks:

^("[^"]*")$

"Dialogue," she said. "More dialogue."

I don't want to catch the quotation marks (only what's inside the quotation marks). So I figured I should use a lookahead and a lookbehind:

^((?<=")[^"]*(?="))$

But now the regex isn't matching anything.

Why is this? And how to fix it?

https://regexr.com/5spdt

EDIT: Removing the outer capture group kind of worked, but now she said is being caputerd too. (?<=")[^"]*(?=")

alexchenco
  • 53,565
  • 76
  • 241
  • 413

2 Answers2

1

You get too much matches, as the assertions to not match the " so anything between 2 double quotes is a match.

You can assert a " to the left, the match all except " until you can assert a " to the right followed by optional pairs of "" till the end of the string.

Assuming no escaped double quotes between the double quotes

 (?<=")[^"]*(?="(?:[^"]*"[^"]*")*[^"]*$)
  • (?<=") Positive lookbehind, assert " directly to the left of the current position
  • [^"]* Match 0+ times any char except "
  • (?= Positive lookahead, assert to the right
    • " Match closing "
    • (?:[^"]*"[^"]*")* Match optional pairs of ""
    • [^"]*$ Match option char other than " and assert end of string
  • ) Close lookahead

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
1

KISS

The regex in the question is overly specific (exploded):

^        # Start of string
(        # Begin capturing group
"
[^"]*
"
)        # End capturing group
$        # End of string

This will only match strings of the form:

"some string"

It would not, for example, match strings of the form:

anything "some string"   (does not start with a quote
"some string" anything   (does not end with a quote)

So given the goal is to capture quoted strings, just don't include the quotes in the capturing group:

"([^"]*)"

And then reference the capturing group, not the whole matching string.

Applied to Javascript

Consider the following code:

input = '"one" something "two" something "three" etc.';
regex = /"([^"]*)"/;
match = input.match(regex);

Match contains: ["\"one\"", "one"] - the 0 entry is the full matching string, the 1 entry is the first capturing group. Adapt js code as relevant.

AD7six
  • 63,116
  • 12
  • 91
  • 123
  • I think you're right. I'm doing `.replace`, so I guess I can write `"..."` in the replace argument if I want to keep the quotes. – alexchenco May 12 '21 at 13:55
  • 2
    er ok, so it's an x/y problem. Please always ask what you want to do, not how you are currently trying to solve it (:. – AD7six May 12 '21 at 13:59