0

Im using PHP preg_match_all function and I need it to return an array of every single word AND every pair of words, including those single words, for example:

preg_match_all('/the regex/','Stackoverflow is awesome',$matches);

The $matches array should contain:

('Stackoverflow' , 'is' , 'awesome' , 'Stackoverflow is' , 'is awesome')

I've tried with this regex but not getting the expected results:

[a-z]+\s?[a-z]*

Nicolas Durán
  • 292
  • 8
  • 19
  • I think you don't look for a regex, this seems to be similar: http://stackoverflow.com/q/3200272/3933332 ? – Rizier123 Mar 16 '15 at 15:33

5 Answers5

2

I don't think you can achieve that with just regular expressions. I would say, use explode and construct the array yourself.

$string = 'Stackoverflow is awesome';
$parts = explode(' ', $string);
for ($i = 1; $i < count($parts); $i++) {
    $parts[] = $parts[$i - 1] . ' ' . $parts[$i];
}
Jeroen Noten
  • 3,574
  • 1
  • 17
  • 25
  • 1
    I ended up using your answer, but instead of explode I use `preg_split("/[\s,.]+/", $string)` function because I am dealing with natural text (with commas, etc.) – Nicolas Durán Mar 16 '15 at 17:01
1

Use \S+ to match all the words. And next you do \S+\s+\S+, it won't match the previously matched characters because regex by default won't do overlapping matches. In-order to make the regex engine to do overlapping matches, you need to put the pattern which matches two words at a time inside a capturing group and also place the capturing group inside positive lookarounds.

$s = "Stackoverflow is awesome";
$regex = '~(?=(\S+\s+\S+))|\S+~';
preg_match_all($regex, $s, $matches);
$matches = array_values(array_filter(call_user_func_array('array_merge', $matches)));
print_r($matches);

Output:

Array
(
    [0] => Stackoverflow
    [1] => is
    [2] => awesome
    [3] => Stackoverflow is
    [4] => is awesome
)
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

This gives limits the phrasing to two words.

<?php
$str = "Stackoverflow is awesome";
$words = explode(" ",$str);
$num_words = count($words);
for ($i = 0; $i < $num_words; $i++) {
  for ($j = $i; $j < $num_words; $j++) {
    $num = 0;

    $temp = "";
    for ($k = $i; $k <= $j; $k++) { 
       $num++;
       $temp .= $words[$k] . " ";             
    }

    if($num < 3)
    echo $temp . "<br />";
  }
}
?>
Demodave
  • 6,242
  • 6
  • 43
  • 58
0

Try this simple regex

 /\w+/i

Rewrite:

     preg_match_all('/\w+/i','Stackoverflow is awesome',$matches);
 print_r($matches);

See this in action here

Amit Verma
  • 40,709
  • 21
  • 93
  • 115
0

You can use lookaheads here:

preg_match_all('/(?=(\b(\w+)(?:\s+(\w+)\b|$)))/','Stackoverflow is awesome',$matches);

Now double words:

print_r($matches[1]);
Array
(
    [0] => Stackoverflow is
    [1] => is awesome
    [2] => awesome
)

And single words:

print_r($matches[2]);
Array
(
    [0] => Stackoverflow
    [1] => is
    [2] => awesome
)

PS: awesome prints in double words also because it is the last word.

anubhava
  • 761,203
  • 64
  • 569
  • 643