0

This question is related to RegEx: Grabbing values between quotation marks, that I've tried to implement in my actual code, but with no success.

What I'd like to accomplish is to parse PHP code, and grab literal double-quoted strings inside the code to automatically fix wrong/bad/unsecure things.

Solutions using token_get_all() are not valid, as the PHP code may be not parsing correctly (invalid, broken, old PHP 4 code).

The regular expression should:

  1. Match only if a double-quote is not preceeded by a single quote
  2. Match only if a double-quote is not followed by a single quote
  3. Also match backslashes inside the double-quoted string
  4. Leave the start and trailing double quoted untouched (return it as part of the match)

To have an example of what the regexp should match, consider this parts of (ugly, old and unsecure) PHP code:

header("Last-Modified: ".gmdate("D, d M Y H:i:s")." GMT");
$sql = "UPDATE $table_name SET
password = password('$newpass'), pchange = '1'
WHERE email = '$email'";
$var = '"' . $something . '"';
$msg = "<p><a href=\"login.html\">Login</a></p>";
echo "<label for=\"whatever\">LABEL</label><select class='".$style."'>";

The regular expression should match:

  1. "Last-Modified: "
  2. "D, d M Y H:i:s"
  3. " GMT"
  4. "UPDATE $table_name SET password = password('$newpass'), pchange = '1' WHERE email = '$email'"
  5. "<p><a href=\"login.html\">Login</a></p>"
  6. "<label for=\"whatever\">LABEL</label><select class='"
  7. "'>"

The regexp will be used within a preg_match() with PREG_OFFSET_CAPTURE, to restart the search where the last match occurred, in this way:

$string_match = preg_match(**REGEXP_HERE**, $php_code, $text_in_double_quotes, PREG_OFFSET_CAPTURE, $last_pos);
if ($string_match) {
    list($text_in_double_quotes, $last_pos) = $text_in_double_quotes[0];
}

Thank you!

P.S.

For those asking why I'm bothering doing this, here's a Working demo with the Regular Expression suggested by @bobblebubble that shows exactly why I'm looking for such a particular regex (and why I can't use preg_match_all in this case)

Maurizio
  • 469
  • 1
  • 4
  • 11
  • 2
    String interpolation vs concatenation isn’t the security problem, it is allowing user-controlled inputs into the string in the first place. Please look into prepared statements. – Chris Haas Dec 30 '22 at 17:03
  • 1
    Don't try to use regexp or validation to that. Just use prepared statements and there will be no risk as long as the charset and data type is correctly set. – JoelCrypto Dec 30 '22 at 17:09
  • @ChrisHaas the question is not about PHP secure patterns to use string concatenation in SQLs vs prepared statements, which I'm well aware of, is about writing a regexp to match quoted strings, also matching escaped double quotes and resilient on start/end single quotes. Thank you for your contribution, though. – Maurizio Dec 31 '22 at 09:43
  • @bobblebubble thank you very mutch! Your solution is _partially_ working as expected. Here's a working sandbox that uses the regexp you provided: https://php.land/s/63b005d560fd0929369096. It did not match the last `"'>"` after ` – Maurizio Dec 31 '22 at 10:05
  • @bobblebubble the sandbox link I've provided is not persistent. Here's a gist with the actual prototype code: https://gist.github.com/mauriziofonte/bfbddbe5e8e5cf87a4a9ffce9dc55312 – Maurizio Dec 31 '22 at 10:09
  • @Maurizio I posted an answer with a different concept (skip single quoted). Let me know if this works. – bobble bubble Jan 01 '23 at 11:36

1 Answers1

2

You could use verbs (*SKIP)(*F) to exclude single quoted substrings.

$regex = '/\'[^\'\\\]*(?:\\\.[^\'\\\]*)*\'(*SKIP)(?!)|"[^"\\\]*(?:\\\.[^"\\\]*)*"/';

See this demo at regex101 - The underlying pattern is from this answer.
To extract multiple items, use this regex with preg_match_all like that:

if(preg_match_all($regex, $str, $out) > 0) {
  print_r($out[0]);    
}

Here is a PHP demo at tio.run, matches will be in $out[0] (full pattern).

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • Thank you! Your solution is working as expected. Please, also have a look at the original question, where I posted an example on TIO.run (thank you also for this one, I did not know this sandbox) in the P.S. section. That's why I cannot use preg_match_all(). – Maurizio Jan 03 '23 at 07:57
  • @Maurizio Maybe you can use `preg_replace_callback` however, glad it works. – bobble bubble Jan 03 '23 at 12:26