-1

PREG:

(?<CV>\$*\w+\s*)\s*\((\s*(?<PRM>(\g<0>)|(?<STRING>(?<Q>['"])[^(?P=Q)]*(?P=Q))|(\g<CV>))\s*([\,]\s*(\g<PRM>))*)?\s*\)

Here's a regex I wrote based on PCRE2. It matches most PHP functions, including more complex nested functions such as:

bar("",bar($str,CONST,func($s,o)))

But I ran into a problem: I couldn't solve the problem with mismatched double quotes like this:

bar("string"",bar($str,CONST,func($s,o))) //1
bar("string\",bar($str,CONST,func($s,o))) //2
bar("",bar($str,CONST,func($s,o))) //This regex is supported, but when I try to be compatible with both cases, the problem occurs
cclilshy
  • 59
  • 5
  • 2
    Any chance you could use a proper parser instead? https://github.com/nikic/PHP-Parser – zerkms Dec 28 '22 at 02:26
  • Yes, I need it, thanks. But this question makes me sick – cclilshy Dec 28 '22 at 02:31
  • `[^(?P=Q)]*` is probably not doing what you might think it does. It matches [characters besides](https://www.regular-expressions.info/charclass.html#negated) `)(?=QP`. You could use a [tempered greedy token](https://www.rexegg.com/regex-quantifiers.html#tempered_greed) e.g. `(?:(?!(?P=Q)).)*` but it's more efficient if you used an [unrolled](https://www.softec.lu/site/RegularExpressions/UnrollingTheLoop) pattern, e.g. [`"(?:[^"\\]*(?:\\.[^"\\]*)*)"|'(?:[^'\\]*(?:\\.[^'\\]*)*)'`](https://regex101.com/r/jhT8v7/1) also considering escaped quotes. – bobble bubble Dec 28 '22 at 03:18
  • You really solved my confusion, I learned new knowledge,stackoverflow is more wonderful because of you – cclilshy Dec 28 '22 at 03:23
  • @cgxg Glad that helped, yes I learn a lot here on Stackoverflow all the time. :) – bobble bubble Dec 28 '22 at 03:34

1 Answers1

0

Thanks @bobble,Problem has been solved, I can alternate single quotes and double quotes | statement as two ways to write, so you can avoid a lot of unnecessary problems.


Edit from @bobblebubble:

It looked like you expected the backreference (?P=Q) to work inside the character class [^(?P=Q)]* but this does not work unfortunately. The negated class would match any character besides )(?=QP.

For such scenario you would usually use what is known as tempered greedy token which will look like (?:(?!(?P=Q)).)* for your case. But for matching quoted parts with potential escaped quotes inside it's considerably more efficient and also readable to alternate between the options and use an unrolled pattern for each quote style. The (?<Q>['"])[^(?P=Q)]*(?P=Q)) can be replaced with:

"(?:[^"\\]*(?:\\.[^"\\]*)*)"|'(?:[^'\\]*(?:\\.[^'\\]*)*)'

References:

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
cclilshy
  • 59
  • 5