-1

I need to split a string in every three words using PHP

"This is an example of what I need."

The output would be:

This is an
is an example
an example of
example of what
of what I
what I need

I have this example with Java

String myString = "This is an example of what I need.";
        String[] words = myString.split("\\s+");
        for (int i = 0; i < words.length; i++) {
            String threeWords;
              if (i == words.length - 1)
                  threeWords = words[i]; 
              else if(i == words.length - 2)
                  threeWords = words[i] + " " + words[i + 1]; 
              else 
                  threeWords = words[i] + " " + words[i + 1] + " " + words[i + 2];
              System.out.println(threeWords);
    }
Dharman
  • 30,962
  • 25
  • 85
  • 135
moonn86
  • 1
  • 1
  • @moonn86 what is the desired output when the word count is not divisible by 3? What if there aren't 3 words? – mickmackusa Sep 16 '21 at 08:26
  • the data you need is called trigrams. (ngrams for the general n-word case). Should help you google for a solution – alexis Sep 16 '21 at 08:26
  • @mick, the trigrams are overlapping – alexis Sep 16 '21 at 08:27
  • @alexis yes, I see the overlap, but first trigram is omitted. We don't know what is desired when the word count is not divisible by 3. This question lacks a comprehensive [mcve], a php coding attempt, and proof of research. It should be closed as Needs Clarity. – mickmackusa Sep 16 '21 at 08:30
  • 1
    the first trigram is omitted by mistake, you can see that from the java code. Yeah the question is unclear and low-effort, but from a new user. – alexis Sep 16 '21 at 08:36
  • Do you need all punctuation to be removed? – mickmackusa Sep 16 '21 at 09:02

2 Answers2

0

Solution that use explode, array_slice and implode.

$example = 'This is an example of what I need.';

$arr = explode(" ",$example);
foreach($arr as $key => $word){
  //if($key == 0) continue;
  $subArr = array_slice($arr,$key,3);
  if(count($subArr) < 3) break;
  echo implode(" ",$subArr)."<br>\n";
}

Output:

This is an
is an example
an example of
example of what
of what I
what I need.

If you want to suppress the first output with This, remove the comment in the line

//if($key == 0) continue;

If the example can have less than 3 words and these should be output then the line with the break must be as follows:

if(count($subArr) < 3 AND $key != 0) break;

For strings that are not only separated by single spaces, preg_split is recommended. Example:

$example = "This  is an example of what I need.
 Sentence,containing a comma and Raphaël.";

$arr = preg_split('/[ .,;\r\n\t]+/u', $example, 0, PREG_SPLIT_NO_EMPTY);
foreach($arr as $key => $word){
  $subArr = array_slice($arr,$key,3);
  if(count($subArr) < 3) break;
  echo implode(" ",$subArr)."<br>\n";
}

Output:

This is an
is an example
an example of
example of what
of what I
what I need
I need Sentence
need Sentence containing
Sentence containing a
containing a comma
a comma and
comma and Raphaël
jspit
  • 7,276
  • 1
  • 9
  • 17
  • As demonstrated [here](https://stackoverflow.com/a/21713214/2943403), `\s` is more brief to use inside the character class. Your pattern will not isolate quote-wrapped, parenthetical, etc words. – mickmackusa Sep 16 '21 at 12:16
0

To include only words (notice the fullstop is removed from the last trigram), use str_word_count() to form the array of strings.

Then you need to loop while there are three elements to print.

Code: (Demo)

$example = 'This is an example of what I need.';

$words = str_word_count($example, 1);
for ($x = 0; isset($words[$x + 2]); ++$x) {
    printf("%s %s %s\n", $words[$x], $words[$x + 1], $words[$x + 2]);
}

Output:

This is an
is an example
an example of
example of what
of what I
what I need

If you don't like printf(), you could echo implode(' ', [the three elements]).

If you want to print a single string of words when there are less than 3 total words in the string, then you could use a post-test loop. Demo

And then, of course, if we going to stumble down the rocky road of "what is a word", then an ironclad definition of "what is a word" will need to be defined and then a regex (potentially with multibyte support) will need to be suitably crafted. Basic Demo

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • Note that str_word_count does not support multibyte characters. If the text contains words like Raphaël, it becomes Rapha l (2 words!) – jspit Sep 16 '21 at 10:36
  • There are work arounds for this fringe case. https://stackoverflow.com/a/19274144/2943403 `str_word_count()` does remove the punctuation at the end of the sentence though. :) – mickmackusa Sep 16 '21 at 11:09