I'm trying to write a PCRE regular expression to search PHP code to find strings in double-quotes, handling escaped double-quotes, and to exclude situations where double-quoted and single-quoted strings overlap, e.g. when building some HTML, such as these:
$str = '<elem prop="' . $var . '">';
$str = '<div class="my-class ' . $my_var_class . ' my-other-class">';
So far I've been able to come up with a reliable regex that handles escaped double-quotes:
"(.*?)(?<!\\)"
This works for lines of code like these:
$str = "this is something";
$str = "this is {$another}";
$str = "could be {$hello['world']}";
$str = "and $hello[world] another";
$str = "'single quotes in double quotes'";
$str = "building <div style=\"width: 100%\" data-var=\"{$var}\"></div>";
But it doesn't work for lines of code like my first example above; it would match "' . $var . '"
, but I don't want it to match anything from that example line.
I've tried using the principles discussed at https://stackoverflow.com/a/62558215 and https://stackoverflow.com/a/6464500, but a look-ahead isn't sufficient by itself, and I'm having a hard time coming up with a look-behind that doesn't give me a compilation error about "lookbehind assertion is not fixed length". I feel like the answer at https://stackoverflow.com/a/36186925/3404349 might (?) be getting close to what I'm looking for, but it seems to me that it's matching the inverse (of sorts) of my goal.