1

I am using the solution from 2202435. But when I add brackets to the string, it doesn't give the right result in the array.

 $text = 'Lorem ipsum ("dolor sit amet") consectetur "adipiscing \\"elit" dolor';
preg_match_all('/"(?:\\\\.|[^\\\\"])*"|\S+/', $text, $matches);
print_r($matches);

The above code produces

   Array
(
    [0] => Array
        (
            [0] => Lorem
            [1] => ipsum
            [2] => ("dolor
            [3] => sit
            [4] => amet")
            [5] => consectetur
            [6] => "adipiscing \"elit"
            [7] => dolor
        )

)

But the result I am looking for is

    Array
(
    [0] => Array
        (
            [0] => Lorem
            [1] => ipsum
            [2] => (
            [3] => "dolor sit amet"
            [4] => )
            [5] => consectetur
            [6] => "adipiscing \"elit"
            [7] => dolor
        )

)

I am able to achieve the above result, if I include a space after '( ' and before ' )'.

Please advise the correct regex expression that would allow me to keep the brackets seperate (with explanation if possible).

Thank you.

Community
  • 1
  • 1
Dilip A
  • 13
  • 2
  • The reason is that the regex you use is meant to keep standalone `"` in the matches. Maybe `'/"(?:\\\\.|[^\\\\"])*"|[^\s"]+/'` will help you. – Wiktor Stribiżew Feb 15 '17 at 09:54
  • Are you sure the unescaped double quotes are always paired in your input? – Wiktor Stribiżew Feb 15 '17 at 10:19
  • @WiktorStribiżew Thank you, Your solution works. As for the input, yes the double quotes always need to be paired as it is part of a search string to query the database. Is it possible to include words in single quotes as a single word along with double quotes? – Dilip A Feb 15 '17 at 10:45

1 Answers1

0

The reason is that the regex you use is meant to keep standalone " in the matches.

If you are sure the unescaped double quotes are always paired in your input, use

'/"(?:\\\\.|[^\\\\"])*"|[^\s"]+/'
                        ^^^^^^

Exclude the " from \S by turning it into a negative character class [^\s] and add the double quote inside.

To include single quoted substrings, you may use

'~"(?:\\\\.|[^\\\\"])*"|\'(?:\\\\.|[^\\\\\'])*\'|[^\s"\']+~'

See the regex demo and a PHP demo:

$re = '~"(?:\\\\.|[^\\\\"])*"|\'(?:\\\\.|[^\\\\\'])*\'|[^\s"\']+~';
$str = 'Lorem ipsum ("dolor sit amet") consectetur "adipiscing \\"elit" dolor \'something  \\\'here\'';
preg_match_all($re, $str, $matches);
print_r($matches[0]);
// => Array ( [0] => Lorem [1] => ipsum [2] => ( [3] => "dolor sit amet" [4] => )
//   [5] => consectetur [6] => "adipiscing \"elit" [7] => dolor [8] => 'something  \'here' )
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • This works, but I am not sure how to escape the string \'apple\\\'s\' ? – Dilip A Feb 15 '17 at 12:29
  • What do you mean by escape it? Use inside PHP code as a string literal? `'apple\'s'` should be defined as `$s = "'apple\\'s'";` – Wiktor Stribiżew Feb 15 '17 at 12:32
  • I mean if i use addslashes("'apple's'"); this will basically return \'apple\'s\'. Since the value is from posted variable – Dilip A Feb 15 '17 at 12:38
  • Yes, [it will](https://ideone.com/B90Yc6), so what is the problem? Do you want to say you still need to match "wild" quotes? – Wiktor Stribiżew Feb 15 '17 at 12:39
  • If I use try to run this thru the previous solution https://ideone.com/PvsKtB , the result should be Array ( [0] => 'apple\'s' ). – Dilip A Feb 15 '17 at 12:45
  • BTW, the string has no balanced quotes. It is just not possible to handle this kind of a string with regex. You cannot tell a `\'` you want to remove from another `\'` you want to keep. – Wiktor Stribiżew Feb 15 '17 at 12:49
  • Ahh! I see, well the best solution is then to force the user to use double quotes as before and ignore the single quotes in pairs... thank you so much – Dilip A Feb 15 '17 at 12:56