I am the author of the cited sentence splitting answer. Here's a modified version that may suit your purposes:
An enhanced regex solution
Assuming you do care about handling: Mr.
and Mrs.
etc. abbreviations, then the following single regex solution works pretty well:
<?php // test.php Rev:20140218_1500
$re = '/# Match sentence ending in .!? followed by optional quote.
( # $1: Sentence.
[^.!?]+ # One or more non-end-of-sentence chars.
(?: # Zero or more not-end-of-sentence dots.
\. # Allow dot mid-sentence, but only if:
(?: # Group allowable dot alternatives.
(?=[^\s\'"]) # Dot is ok if followed by non-ws,
| (?<= # or not one of the following:
Mr\. # Either "Mr."
| Mrs\. # or "Mrs.",
| Ms\. # or "Ms.",
| Jr\. # or "Jr.",
| Dr\. # or "Dr.",
| Prof\. # or "Prof.",
| Sr\. # or "Sr.",
| T\.V\.A\. # or "T.V.A.",
# or... (you get the idea).
) # End positive lookbehind.
) # Group allowable dot alternatives.
[^.!?]* # Zero or more non-end-of-sentence chars.
)* # Zero or more not-end-of-sentence dots.
(?: # Sentence end alternatives.
[.!?] # Either end of sentence punctuation
[\'"]? # followed by optional quote,
| $ # Or end of string with no punctuation.
) # Sentence end alternatives.
) # End $1: Sentence.
(?:\s+|$) # Sentence ends with ws or EOS.
/ix';
$text = 'This is sentence one. Sentence two! Sentence thr'.
'ee? Sentence "four". Sentence "five"! Sentence "'.
'six"? Sentence "seven." Sentence \'eight!\' Dr. '.
'Jones said: "Mrs. Smith you have a lovely daught'.
'er!" The T.V.A. is a big project! Last sentence '.
'with no ending punctuation';
$sentences = array(); // Initialize array of sentences.
function _getSentencesCallback($matches) {
global $sentences;
$sentences[] = $matches[1];
return '';
}
preg_replace_callback($re, '_getSentencesCallback', $text);
for ($i = 0; $i < count($sentences); ++$i) {
printf("Sentence[%d] = [%s]\n", $i + 1, $sentences[$i]);
}
?>
Note that you can easily add or take away abbreviations from the expression. Given the following test paragraph:
This is sentence one. Sentence two! Sentence three? Sentence "four". Sentence "five"! Sentence "six"? Sentence "seven." Sentence 'eight!' Dr. Jones said: "Mrs. Smith you have a lovely daughter!" The T.V.A. is a big project!
Here is the output from the script:
Sentence[1] = [This is sentence one.]
Sentence[2] = [Sentence two!]
Sentence[3] = [Sentence three?]
Sentence[4] = [Sentence "four".]
Sentence[5] = [Sentence "five"!]
Sentence[6] = [Sentence "six"?]
Sentence[7] = [Sentence "seven."]
Sentence[8] = [Sentence 'eight!']
Sentence[9] = [Dr. Jones said: "Mrs. Smith you have a lovely daughter!"]
Sentence[10] = [The T.V.A. is a big project!]
Sentence[11] = [Last sentence with no ending punctuation]
Hope this helps and Happy Regexing!
Edit: 2014-02-19 08:00 Last sentence at end of string no longer requires punctuation.