28

For example, I have an article should be splitted according to sentence boundary such as ".", "?", "!" and ":".

But as well all know, whether preg_split or explode function, they both remove the delimiter.

Any help would be really appreciated!

EDIT:

I can only come up with the code below, it works great though.

$content=preg_replace('/([\.\?\!\:])/',"\\1[D]",$content);

Thank you!!! Everyone. It is only five minutes for getting 3 answers! And I must apologize for not being able to see the PHP manual carefully before asking question. Sorry.

Gumbo
  • 643,351
  • 109
  • 780
  • 844
user353889
  • 418
  • 1
  • 5
  • 8

5 Answers5

27

I feel this is worth adding. You can keep the delimiter in the "after" string by using regex lookahead to split:

$input = "The address is http://stackoverflow.com/";
$parts = preg_split('@(?=http://)@', $input);
// $parts[1] is "http://stackoverflow.com/"

And if the delimiter is of fixed length, you can keep the delimiter in the "before" part by using lookbehind:

$input = "The address is http://stackoverflow.com/";
$parts = preg_split('@(?<=http://)@', $input);
// $parts[0] is "The address is http://"

This solution is simpler and cleaner in most cases.

wavemode
  • 2,076
  • 1
  • 19
  • 24
19

You can set the flag PREG_SPLIT_DELIM_CAPTURE when using preg_split and capture the delimiters too. Then you can take each pair of 2‍n and 2‍n+1 and put them back together:

$parts = preg_split('/([.?!:])/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
$sentences = [];
for ($i = 0, $n = count($parts) - 1; $i <= $n; $i += 2) {
    $sentences[] = $parts[$i] . ($parts[$i+1] ?? '');
}

Note to pack the splitting delimiter into a group, otherwise they won’t be captured.

PiTheNumber
  • 22,828
  • 17
  • 107
  • 180
Gumbo
  • 643,351
  • 109
  • 780
  • 844
17

preg_split with PREG_SPLIT_DELIM_CAPTURE flag

For example

$parts = preg_split("/([\.\?\!\:])/", $string, -1, PREG_SPLIT_DELIM_CAPTURE);
evandrix
  • 6,041
  • 4
  • 27
  • 38
Shaun Hare
  • 3,771
  • 2
  • 24
  • 36
0

Try T-Regx

<?php
$parts = pattern('([.?!:])')->split($string);
Danon
  • 2,771
  • 27
  • 37
0

Parsing English sentences has a lot of nuance and fringe cases. This makes crafting a perfect parser very difficult to do. It is important to have sufficient test cases using your real project data to make sure that you are covering all scenarios.

There is no need to use lookarounds or capture groups for this task. You simply match the punctuation symbol(s), then forget them with \K, then match one or more whitespace characters that occurs between sentences. Using the PREG_SPLIT_NO_EMPTY flag prevents creating empty elements if your string starts with or ends with characters that satisfy the pattern.

Code: (Demo)

$str = 'Heading: This is a string. Very exciting! What do you think? ...one more thing, this is cool.';

var_export(
    preg_split('~[.?!:]+\K\s+~', $str, 0, PREG_SPLIT_NO_EMPTY)
);

Output:

array (
  0 => 'Heading:',
  1 => 'This is a string.',
  2 => 'Very exciting!',
  3 => 'What do you think?',
  4 => '...one more thing, this is cool.',
)
mickmackusa
  • 43,625
  • 12
  • 83
  • 136