2

I have a string as below (the letters in the example could be numbers or texts and could be either uppercase or lowercase or both. If a value is a sentence, it should be between single quotations):

$string="a,b,c,(d,e,f),g,'h, i j.',k";

How can I explode that to get the following result?

Array([0]=>"a",[1]=>"b",[2]=>"c",[3]=>"(d,e,f)",[4]=>"g",[5]=>"'h,i j'",[6]=>"k")

I think using regular expressions will be a fast as well as clean solution. Any idea?

EDIT: This is what I have done so far, which is very slow for the strings having a long part between parenthesis:

$separator="*"; // whatever which is not used in the string
$Pattern="'[^,]([^']+),([^']+)[^,]'";
while(ereg($Pattern,$String,$Regs)){
    $String=ereg_replace($Pattern,"'\\1$separator\\2'",$String);
}

$Pattern="\(([^(^']+),([^)^']+)\)";
while(ereg($Pattern,$String,$Regs)){
    $String=ereg_replace($Pattern,"(\\1$separator\\2)",$String);
}

return $String;

This, will replace all the commas between the parenthesis. Then I can explode it by commas and the replace the $separator with the original comma.

SAVAFA
  • 818
  • 8
  • 23

1 Answers1

6

You can do the job using preg_match_all

$string="a,b,c,(d,e,f),g,'h, i j.',k";

preg_match_all("~'[^']+'|\([^)]+\)|[^,]+~", $string, $result);
print_r($result[0]);

Explanation:

The trick is to match parenthesis before the ,

~          Pattern delimiter
'
[^']       All charaters but not a single quote
+          one or more times 
'
|          or
\([^)]+\)  the same with parenthesis
|          or
[^,]+      Any characters except commas one or more times
~

Note that the quantifiers in [^']+', in [^)]+\) but also in [^,]+ are all automatically optimized to possessive quantifiers at compile time due to "auto-possessification". The first two because the character class doesn't contain the next character, and the last because it is at the end of the pattern. In both cases, an eventual backtracking is unnecessary.

if you have more than one delimiter like quotes (that are the same for open and close), you can write your pattern like this, using a capture group:

$string="a,b,c,(d,e,f),g,'h, i j.',k,°l,m°,#o,p#,@q,r@,s";

preg_match_all('~([\'#@°]).*?\1|\([^)]+\)|[^,]+~', $string, $result);
print_r($result[0]);

explanation:

(['#@°])   one character in the class is captured in group 1
.*?        any character zero or more time in lazy mode 
\1         group 1 content

With nested parenthesis:

$string="a,b,(c,(d,(e),f),t),g,'h, i j.',k,°l,m°,#o,p#,@q,r@,s";

preg_match_all('~([\'#@°]).*?\1|(\((?:[^()]+|(?-1))*+\))|[^,]+~', $string, $result);
print_r($result[0]);
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • Thanks, I'll try this and let you know how that is working in my real case. Just one more thing, I have added more description to the question and don't know how it could affect your solution. – SAVAFA May 10 '13 at 10:56
  • @SaVaFa: you can do the same thing with single quote, see the update. – Casimir et Hippolyte May 10 '13 at 12:08
  • @CasimiretHippolyte: It's work fine, i think it is solution, just i notice that it give problem for nested parentheses like (,(,),)...But it work for given question. +1 – Navnath Godse May 10 '13 at 12:29
  • @Navnath: a solution for nested parenthesis has been added. – Casimir et Hippolyte May 10 '13 at 12:55
  • @CasimiretHippolyte, it works good, but not correct in all the cases. For example for this string `$string="'2ziFw, 3xOHEmwQP0 HxHXK5e',42,'aa, g','b',(87,700),56";` using `preg_match_all("~(['#@°]).*?\1|\([^)]++\)|[^,]++~", $string,$result);`, it should result in `Array ( [0] => '2ziFw, 3xOHEmwQP0 HxHXK5e' [2] => 42 [3] => 'aa, g' [4] => 'b' [5] => (87,700) [6] => 56 )` which is not and it's breaking in all commas. Many thanks in advance. – SAVAFA May 10 '13 at 17:39
  • @SaVaFa: corrected: just change double quote for single quote and escape single quotes in the pattern. – Casimir et Hippolyte May 10 '13 at 18:16