3

i'm trying to use php to split a string into array components using either " or ' as the delimiter. i just want to split by the outermost string. here are four examples and the desired result for each:

$pattern = "?????";
$str = "the cat 'sat on' the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
    [0] => the cat 
    [1] => 'sat on'
    [2] =>  the mat
)*/

$str = "the cat \"sat on\" the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
    [0] => the cat 
    [1] => "sat on"
    [2] =>  the mat
)*/

$str = "the \"cat 'sat' on\" the mat";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
    [0] => the
    [1] => "cat 'sat' on"
    [2] =>  the mat
)*/

$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
$res = preg_split($pattern, $str);
print_r($res);
/*output:
Array
(
    [0] => the
    [1] => 'cat "sat" on'
    [2] =>  the mat
    [3] => 'when "it" was'
    [4] =>  seventeen
)*/

as you can see i only want to split by the outermost quotation, and i want to ignore any quotations within quotations.

the closest i have come up with for $pattern is

$pattern = "/((?P<quot>['\"])[^(?P=quot)]*?(?P=quot))/";

but obviously this is not working.

mulllhausen
  • 4,225
  • 7
  • 49
  • 71

4 Answers4

2

You can use preg_split with the PREG_SPLIT_DELIM_CAPTURE option. The regular expressions is not quite as elegant as @Jan Turoň's back reference approach because the required capture group messes up the results.

$str = "the 'cat \"sat\" on' the mat the \"cat 'sat' on\" the mat";
$match = preg_split("/('[^']*'|\"[^\"]*\")/U", $str, null, PREG_SPLIT_DELIM_CAPTURE);
print_r($match);
mcrumley
  • 5,682
  • 3
  • 25
  • 33
  • 1
    I believe your solution is more elegant. +1 – Jan Turoň Sep 10 '12 at 16:04
  • that was what i was after. its surprising it didn't need back-references, but great! – mulllhausen Sep 10 '12 at 22:24
  • if i want to extend the regex to ignore escaped quotes will that be easy? for example `$str = "the 'cat s\'at on' the mat"` should give `[0] => the , [1] => 'cat s\'at on', [2] => the mat`. if not then i will add a fresh new question for this. cheers! – mulllhausen Sep 11 '12 at 00:31
  • `"/('(?:.*)(?<!\\\\)(?>\\\\\\\\)*'|\"(?:.*)(?<!\\\\)(?>\\\\\\\\)*\")/U"` - Yes, all those backslashes are required. – mcrumley Sep 11 '12 at 14:47
  • That will allow backslash-escaped quotes. A double backslash will NOT escape a quote, but will be a literal backslash. You will have to remove the extras after splitting. – mcrumley Sep 11 '12 at 14:49
1

You can use just preg_match for this:

$str = "the \"cat 'sat' on\" the mat";
$pattern = '/^([^\'"]*)(([\'"]).*\3)(.*)$/';

if (preg_match($pattern, $str, $matches)) {
  printf("[initial] => %s\n[quoted] => %s\n[end] => %s\n",
     $matches[1],
     $matches[2],
     $matches[4]
  );
}

This prints:

[initial] => the 
[quoted] => "cat 'sat' on"
[end] =>  the mat

Here is an explanation of the regex:

  • /^([^\'"]*) => put the initial bit until the first quote (either single or double) in the first captured group
  • (([\'"]).*\3) => capture in \2 the text corresponding from the initial quote (either single or double) (that is captured in \3) until the closing quote (that must be the same type as the opening quote, hence the \3). The fact that the regexp is greedy by nature helps to get from the first quote to the last one, regardless of how many quotes are inside.
  • (.*)$/ => Capture until the end in \4
Carlos Campderrós
  • 22,354
  • 11
  • 51
  • 57
1

Yet another solution using preg_replace_callback

$result1 = array();
function parser($p) {
  global $result1;
  $result1[] = $p[0];
  return "|"; // temporary delimiter
}

$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
$str = preg_replace_callback("/(['\"]).*\\1/U", "parser", $str);
$result2 = explode("|",$str); // using temporary delimiter

Now you can zip those arrays using array_map

$result = array();
function zipper($a,$b) {
  global $result;
  if($a) $result[] = $a;
  if($b) $result[] = $b;
}
array_map("zipper",$result2,$result1);
print_r($result);

And the result is

[0] => the 
[1] => 'cat "sat" on'
[2] =>  the mat 
[3] => 'when "it" was'
[4] =>  seventeen

Note: I'd would be probably better to create a class doing this feat, so the global variables can be avoided.

Jan Turoň
  • 31,451
  • 23
  • 125
  • 169
0

You can use back references and ungreedy modifier in preg_match_all

$str = "the 'cat \"sat\" on' the mat 'when \"it\" was' seventeen";
preg_match_all("/(['\"])(.*)\\1/U", $str, $match);
print_r($match[0]);

Now you have your outermost quotation parts

[0] => 'cat "sat" on'
[1] => 'when "it" was'

And you can find the rest of the string with substr and strpos (kind of blackbox solution)

$a = $b = 0; $result = array();
foreach($match[0] as $part) {
  $b = strpos($str,$part);
  $result[] = substr($str,$a,$b-$a);
  $result[] = $part;
  $a = $b+strlen($part);
}
$result[] = substr($str,$a);
print_r($result);

Here is the result

[0] => the 
[1] => 'cat "sat" on'
[2] =>  the mat 
[3] => 'when "it" was'
[4] =>  seventeen

Just strip eventual empty heading/trailing element if the quotation is at the very beginning/end of the string.

Jan Turoň
  • 31,451
  • 23
  • 125
  • 169
  • that does work, but it doesn't return the bits outside the match, which i do need – mulllhausen Sep 10 '12 at 14:43
  • it is a possible solution. if nobody comes up with a single regex method then i will award the answer. but i would prefer a single regex as it would be simpler. – mulllhausen Sep 10 '12 at 14:52
  • I updated my solution to include the bits outside the match, is it ok? – Jan Turoň Sep 10 '12 at 14:57
  • it does what i'm after, but i would still prefer a one line regular expression. anyway, thanks for this solution - if nobody posts a one line solution then i will award this one the tick :) – mulllhausen Sep 10 '12 at 15:04