I am looking for assistance in making this code more accurate. For any given text ($my_block_of_text) the script below will breakup the content into sentences based on where fullstops, exclamation marks and similar end-of-sentence punctuation occurs.
$parts = preg_split('/([.?!:\]])/', $my_block_of_text, -1, PREG_SPLIT_DELIM_CAPTURE);
$sentences = array();
for ($i=0, $n=count($parts)-1; $i<$n; $i+=2) {
$sentences[] = $parts[$i].$parts[$i+1];
}
if ($parts[$n] != '') {
$sentences[] = $parts[$n];
}
The issue with this code however, is that the regular expression being used in the preg_split function doesn't take into account instances of Mr. Mrs. Miss. Ms. How can an exclusion be added to a regular expression to avoid these instances?
Thanks.