24
$str = "This is a    string";
$words = explode(" ", $str);

Works fine, but spaces still go into array:

$words === array ('This', 'is', 'a', '', '', '', 'string');//true

I would prefer to have words only with no spaces and keep the information about the number of spaces separate.

$words === array ('This', 'is', 'a', 'string');//true
$spaces === array(1,1,4);//true

Just added: (1, 1, 4) means one space after the first word, one space after the second word and 4 spaces after the third word.

Is there any way to do it fast?

Thank you.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Haradzieniec
  • 9,086
  • 31
  • 117
  • 212

8 Answers8

37

For splitting the String into an array, you should use preg_split:

$string = 'This is a    string';
$data   = preg_split('/\s+/', $string);

Your second part (counting spaces):

$string = 'This is a    string';
preg_match_all('/\s+/', $string, $matches);
$result = array_map('strlen', $matches[0]);// [1, 1, 4]
Alma Do
  • 37,009
  • 9
  • 76
  • 105
3

Here is one way, splitting the string and running a regex once, then parsing the results to see which segments were captured as the split (and therefore only whitespace), or which ones are words:

$temp = preg_split('/(\s+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
    if( strlen( trim( $item)) === 0) {
        $spaces[] = strlen( $item);
    } else {
        $result[] = $item;
    }
    return $result;
}, array());

You can see from this demo that $words is:

Array
(
    [0] => This
    [1] => is
    [2] => a
    [3] => string
)

And $spaces is:

Array
(
    [0] => 1
    [1] => 1
    [2] => 4
)
nickb
  • 59,313
  • 13
  • 108
  • 143
  • thank you very much for you answer. I've tested both your and Alma Do Mundo / silkfire solutions. All solutions work fine, but Alma Do Mundo's work about two times faster. Thank you for your solution anyway. You can compare both if you want (pleae see my reply on my own question in a second). – Haradzieniec Sep 05 '13 at 15:06
1

You can use preg_split() for the first array:

$str   = 'This is a    string';
$words = preg_split('#\s+#', $str);

And preg_match_all() for the $spaces array:

preg_match_all('#\s+#', $str, $m);
$spaces = array_map('strlen', $m[0]);
silkfire
  • 24,585
  • 15
  • 82
  • 105
0

Another way to do it would be using foreach loop.

$str = "This is a    string";
$words = explode(" ", $str);
$spaces=array();
$others=array();
foreach($words as $word)
{
if($word==' ')
{
array_push($spaces,$word);
}
else
{
array_push($others,$word);
}
}
Ahmar Ali
  • 1,038
  • 7
  • 27
  • 52
0

Here are the results of performance tests:

$str = "This is a    string";

var_dump(time());

for ($i=1;$i<100000;$i++){
//Alma Do Mundo  - the winner
$rgData = preg_split('/\s+/', $str);


preg_match_all('/\s+/', $str, $rgMatches);
$rgResult = array_map('strlen', $rgMatches[0]);// [1,1,4]


}
print_r($rgData); print_r( $rgResult);
var_dump(time());




for ($i=1;$i<100000;$i++){
//nickb
$temp = preg_split('/(\s+)/', $str, -1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
$spaces = array();
$words = array_reduce( $temp, function( &$result, $item) use ( &$spaces) {
    if( strlen( trim( $item)) === 0) {
        $spaces[] = strlen( $item);
    } else {
        $result[] = $item;
    }
    return $result;
}, array());
}


print_r( $words); print_r( $spaces);
var_dump(time());

int(1378392870) Array ( [0] => This [1] => is [2] => a [3] => string ) Array ( [0] => 1 [1] => 1 [2] => 4 ) int(1378392871) Array ( [0] => This [1] => is [2] => a [3] => string ) Array ( [0] => 1 [1] => 1 [2] => 4 ) int(1378392873)

Haradzieniec
  • 9,086
  • 31
  • 117
  • 212
0

$financialYear = 2015-2016;

$test = explode('-',$financialYear);
echo $test[0]; // 2015
echo $test[1]; // 2016
Raj
  • 57
  • 1
  • 1
0

Splitting with regex has been demonstrated well by earlier answers, but I think this is a perfect case for calling ctype_space() to determine which result array should receive the encountered value.

Code: (Demo)

$string = "This is a    string";

$words = [];
$spaces = [];

foreach (preg_split('~( +)~', $string, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $s) {
    if (ctype_space($s)) {
        $spaces[] = strlen($s);
    } else {
        $words[] = $s;
    }
}

var_export([
    'words' => $words,
    'spaces' => $spaces
]);

Output:

array (
  'words' => 
  array (
    0 => 'This',
    1 => 'is',
    2 => 'a',
    3 => 'string',
  ),
  'spaces' => 
  array (
    0 => 1,
    1 => 1,
    2 => 4,
  ),
)

If you want to replace the piped constants used by preg_split() you can just use 3 (Demo). This represents PREG_SPLIT_NO_EMPTY which is 1 plus PREG_SPLIT_DELIM_CAPTURE which is 2. Be aware that with this reduction in code width, you also lose code readability.

preg_split('~( +)~', $string, -1, 3)
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
0

What about this? Does someone care to profile this?

    $str = str_replace(["\t", "\r", "\r", "\0", "\v"], ' ', $str); // \v -> vertical space, see trim()
    $words = explode(' ', $str);
    $words = array_filter($words); // there would be lots elements from lots of spaces so skip them.
Svetoslav Marinov
  • 1,498
  • 14
  • 11