54

How can I explode the following string:

Lorem ipsum "dolor sit amet" consectetur "adipiscing elit" dolor

into

array("Lorem", "ipsum", "dolor sit amet", "consectetur", "adipiscing elit", "dolor")

So that the text in quotation is treated as a single word.

Here's what I have for now:

$mytext = "Lorem ipsum %22dolor sit amet%22 consectetur %22adipiscing elit%22 dolor"
$noquotes = str_replace("%22", "", $mytext");
$newarray = explode(" ", $noquotes);

but my code divides each word into an array. How do I make words inside quotation marks treated as one word?

Drew Hammond
  • 588
  • 5
  • 19
timofey
  • 543
  • 1
  • 4
  • 4
  • 2
    This sounds like a job for Regex – Earlz Feb 04 '10 at 19:10
  • See also [An explode() function that ignores characters inside quotes?](http://stackoverflow.com/questions/3264775/an-explode-function-that-ignores-characters-inside-quotes) – Bergi Sep 10 '13 at 21:43

5 Answers5

90

This would have been much easier with str_getcsv().

$test = 'Lorem ipsum "dolor sit amet" consectetur "adipiscing elit" dolor';
var_dump(str_getcsv($test, ' '));

Gives you

array(6) {
  [0]=>
  string(5) "Lorem"
  [1]=>
  string(5) "ipsum"
  [2]=>
  string(14) "dolor sit amet"
  [3]=>
  string(11) "consectetur"
  [4]=>
  string(15) "adipiscing elit"
  [5]=>
  string(5) "dolor"
}
Drew Hammond
  • 588
  • 5
  • 19
Petah
  • 45,477
  • 28
  • 157
  • 213
  • This works on my development machine, but not on my production server. :-/ – Martin Ueding Mar 17 '12 at 18:22
  • 4
    str_getcsv requires PHP 5.3. – armakuni Aug 02 '13 at 06:18
  • 5
    Be aware that it "ignores" the quotes. If you need them to be there in the split also then this wont work. – Gayan Dasanayake Apr 12 '18 at 14:38
  • I've made some speed test and preg_match_all is about 3-5 times quicker. Probably not an issue for most people, specially if don't need the quotes (in this case it's much easier to use), but I think worth a mention. – err Jan 01 '19 at 13:51
  • @err care to share you tests? – Petah Jan 05 '19 at 09:12
  • Nothing special, just wrapped around both with a 1 to 10000 for cycle and checked microtimes before and after. Both fast enough for single use, even with the test quantity, hence I mentioned it probably won't be a problem to most of us. – err Jan 06 '19 at 10:10
89

You could use a preg_match_all(...):

$text = 'Lorem ipsum "dolor sit amet" consectetur "adipiscing \\"elit" dolor';
preg_match_all('/"(?:\\\\.|[^\\\\"])*"|\S+/', $text, $matches);
print_r($matches);

which will produce:

Array
(
    [0] => Array
        (
            [0] => Lorem
            [1] => ipsum
            [2] => "dolor sit amet"
            [3] => consectetur
            [4] => "adipiscing \"elit"
            [5] => dolor
        )

)

And as you can see, it also accounts for escaped quotes inside quoted strings.

EDIT

A short explanation:

"           # match the character '"'
(?:         # start non-capture group 1 
  \\        #   match the character '\'
  .         #   match any character except line breaks
  |         #   OR
  [^\\"]    #   match any character except '\' and '"'
)*          # end non-capture group 1 and repeat it zero or more times
"           # match the character '"'
|           # OR
\S+         # match a non-whitespace character: [^\s] and repeat it one or more times

And in case of matching %22 instead of double quotes, you'd do:

preg_match_all('/%22(?:\\\\.|(?!%22).)*%22|\S+/', $text, $matches);
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • 1
    Is there a reason not to use `preg_split` instead of `preg_match_all`? it seems like a more natural fit IMO. – prodigitalson Feb 04 '10 at 19:20
  • That's Awesome! I'll have to study the code for a bit to figure what just happened! thanks – timofey Feb 04 '10 at 19:21
  • 3
    @prodigitalson: no, using `preg_split(...)` you cannot account for escaped characters. `preg_match_all(...)` "behaves" more like a parser which is the more natural thing to do here. Besides, using a `preg_split(...)`, you'll need to look ahead on each space to see how many quotes are ahead of it, making it an `O(n^2)` operation: no problem for small strings, but might decrease the runtime when larger strings are involved. – Bart Kiers Feb 04 '10 at 19:31
  • @timofey, see my edit. Don't hesitate to ask for more clarification if it's not clear to you: you're the one maintaining the code, so you should understand it (and I'm more than happy to provide extra information if it's needed). – Bart Kiers Feb 04 '10 at 19:36
  • Thanks Bart K.! I was already searching google for answers on that one:) – timofey Feb 04 '10 at 19:39
  • But then if I want to replace Lorem ipsum %22dolor sit amet%22 consectetur %22adipiscing elit%22 dolor (basically the quotation marks are listed as %22) the following doesn't seem to work: preg_match_all('/%22(?:\\\\.|[^\\\\"])*%22|\S+/', $text, $matches); – timofey Feb 04 '10 at 19:43
  • That's beginning to make sense! Thanks – timofey Feb 04 '10 at 19:56
  • In single quoted php strings the '\' won't escape so you don't need \\\\ for one \. – Calmarius Dec 22 '10 at 13:39
  • Oh it's not true. \ and ' still should be escaped. sry – Calmarius Dec 22 '10 at 18:39
  • why is your solution doing this http://pastebin.com/bhrnMGST to this string - this has a \"quoted sentence\" inside – madphp Jun 10 '11 at 13:55
  • @Bart Kiers does your solution apply to my example? – madphp Jun 10 '11 at 15:52
  • @Bart Kiers Thanks! If I have single quotes? – madphp Jun 10 '11 at 18:50
  • @Bart Kiers Things have changed a little bit. Sorry about this. After using mysql_real_escape_string() I get this, - this has a \\\'quoted sentence\\\' inside. So I need to account for those extra slashes (i dont know if it makes a difference) and single or double quotes. – madphp Jun 10 '11 at 18:57
  • @Bart Kiers it wouldnt last 2 minutes. haha. Give me one more hit of regex and i'll be gone. – madphp Jun 10 '11 at 19:09
  • Preg split alternative: https://stackoverflow.com/a/32034603/2897386 – DustWolf Dec 16 '20 at 13:53
4

You can also try this multiple explode function

function multiexplode ($delimiters,$string)
{

$ready = str_replace($delimiters, $delimiters[0], $string);
$launch = explode($delimiters[0], $ready);
return  $launch;
}

$text = "here is a sample: this text, and this will be exploded. this also | this one too :)";
$exploded = multiexplode(array(",",".","|",":"),$text);

print_r($exploded);
Taryn
  • 242,637
  • 56
  • 362
  • 405
Nikz
  • 1,346
  • 1
  • 18
  • 24
2

I came here with a complex string splitting problem similar to this, but none of the answers here did exactly what I wanted - so I wrote my own.

I am posting it here just in case it is helpful to someone else.

This is probably a very slow and inefficient way to do it - but it works for me.

function explode_adv($openers, $closers, $togglers, $delimiters, $str)
{
    $chars = str_split($str);
    $parts = [];
    $nextpart = "";
    $toggle_states = array_fill_keys($togglers, false); // true = now inside, false = now outside
    $depth = 0;
    foreach($chars as $char)
    {
        if(in_array($char, $openers))
            $depth++;
        elseif(in_array($char, $closers))
            $depth--;
        elseif(in_array($char, $togglers))
        {
            if($toggle_states[$char])
                $depth--; // we are inside a toggle block, leave it and decrease the depth
            else
                // we are outside a toggle block, enter it and increase the depth
                $depth++;

            // invert the toggle block state
            $toggle_states[$char] = !$toggle_states[$char];
        }
        else
            $nextpart .= $char;

        if($depth < 0) $depth = 0;

        if(in_array($char, $delimiters) &&
           $depth == 0 &&
           !in_array($char, $closers))
        {
            $parts[] = substr($nextpart, 0, -1);
            $nextpart = "";
        }
    }
    if(strlen($nextpart) > 0)
        $parts[] = $nextpart;

    return $parts;
}

Usage is as follows. explode_adv takes 5 arguments:

  1. An array of characters that open a block - e.g. [, (, etc.
  2. An array of characters that close a block - e.g. ], ), etc.
  3. An array of characters that toggle a block - e.g. ", ', etc.
  4. An array of characters that should cause a split into the next part.
  5. The string to work on.

This method probably has flaws - edits are welcome.

starbeamrainbowlabs
  • 5,692
  • 8
  • 42
  • 73
1

In some situations the little known token_get_all() might prove useful:

$tokens = token_get_all("<?php $text ?>");
$separator = ' ';
$items = array();
$item = "";
$last = count($tokens) - 1;
foreach($tokens as $index => $token) {
    if($index != 0 && $index != $last) {
        if(count($token) == 3) {
            if($token[0] == T_CONSTANT_ENCAPSED_STRING) {
                $token = substr($token[1], 1, -1);
            } else {
                $token = $token[1];
            }
        }
        if($token == $separator) {
            $items[] = $item;
            $item = "";
        } else {
            $item .= $token;
        }
    }
}

Results:

Array
(
    [0] => Lorem
    [1] => ipsum
    [2] => dolor sit amet
    [3] => consectetur
    [4] => adipiscing elit
    [5] => dolor
)
Drew Hammond
  • 588
  • 5
  • 19
cleong
  • 7,242
  • 4
  • 31
  • 40