0

I need to extract a string with any symbols between a doublequotes using preg_match including double quotes as well.

I've tried all solutions in the question below but nothing has worked for my case: php to extract a string from double quote

Sample string: "ASD ""ASD ADS"""

I need to extract: ASD ""ASD ADS""

Current code which is working except I don't know how to handle the exception above which ruining whole structure:

$regex = '/"(.*)"/imU';
$content = file_get_contents($file->getRealPath());
$filename = $file->getClientOriginalName();

preg_match_all($regex, $content, $matches);

return $matches[0];
hxdef
  • 393
  • 2
  • 6
  • 15
  • 1
    Please post what code you have so far. – Nigel Ren Jun 16 '18 at 12:31
  • @NigelRen posted, can have a look – hxdef Jun 16 '18 at 12:33
  • 1
    Remove `U` from regex – Pyton Jun 16 '18 at 12:34
  • You can always try out regex's on something like https://regex101.com/ to check them out (including the flags you need) – Nigel Ren Jun 16 '18 at 12:36
  • @Pyton wow, that really worked, thank you so much, could you briefly explain why this U ruined the whole thing )))) thanks =) – hxdef Jun 16 '18 at 12:37
  • One thing you'll have to be careful of is that it always looks for the last quote, so it doesn't care if they are balanced or not (i.e. `"ASD ""ASD ADS""" s"`) – Nigel Ren Jun 16 '18 at 12:42
  • @NigelRen yea definitely, but I'm wondering now why removing "U" make it work, even those expressions which are more complex and which include double quotes in didn't worked properly – hxdef Jun 16 '18 at 12:44
  • 1
    `\U` is a modifier that makes the regex [ungreedy](http://php.net/manual/en/reference.pcre.pattern.modifiers.php). In this case the `.*` will be `.*?` which will match until the first double quotes is encountered instead of the last. – The fourth bird Jun 16 '18 at 12:46
  • Have a read of http://php.net/manual/en/reference.pcre.pattern.modifiers.php – Nigel Ren Jun 16 '18 at 12:47

1 Answers1

1

To serve properly 2 adjacent double quotes between the opening and closing double quote, you must use 2 alternatives: either a char other than the double quote or 2 consecutive double quotes.

So the regex can be as follows:

/"(?:[^"]|"")+"/g

Description:

  • " - Match the "opening" double quote.
  • (?: - Start of a non-capturing group, needed due to the + quantifier after it.
    • [^"] - The first alternative - any char other than the double quote.
  • | - Or.
    • "" - Two double quotes.
  • ) - End of the non-capturing group.
  • + - This group can occur 1 or more times.
  • " - Match the "closing" double quote.

It is enough to use g option only.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41