This is an extension of the PHP sentences boundaries question on SO.
I'd like know how to change the regex to keep newlines as well.
Sample code to split some text by sentence, remove one sentence, then put back together:
<?php
$re = '/# Split sentences on whitespace between them.
(?<= # Begin positive lookbehind.
[.!?] # Either an end of sentence punct,
| [.!?][\'"] # or end of sentence punct and quote.
) # End positive lookbehind.
(?<! # Begin negative lookbehind.
Mr\. # Skip either "Mr."
| Mrs\. # or "Mrs.",
| Ms\. # or "Ms.",
| Jr\. # or "Jr.",
| Dr\. # or "Dr.",
| Prof\. # or "Prof.",
| Sr\. # or "Sr.",
| T\.V\.A\. # or "T.V.A.",
# or... (you get the idea).
) # End negative lookbehind.
[\s+|^$] # Split on whitespace between sentences/empty lines.
/ix';
$text = <<<EOL
This is paragraph one. This is sentence one. Sentence two!
This is paragraph two. This is sentence three. Sentence four!
EOL;
echo "\nBefore: \n" . $text . "\n";
$sentences = preg_split($re, $text, -1);
$sentences[1] = " "; // remove 'sentence one'
// put text back together
$text = implode( $sentences );
echo "\nAfter: \n" . $text . "\n";
?>
Running this, the output is
Before:
This is paragraph one. This is sentence one. Sentence two!
This is paragraph two. This is sentence three. Sentence four!
After:
This is paragraph one. Sentence two!
This is paragraph two. This is sentence three. Sentence four!
I'm trying to get the 'After' text to be the same as the 'Before' text, just with the one sentence removed.
After:
This is paragraph one. Sentence two!
This is paragraph two. This is sentence three. Sentence four!
I'm hoping this can be done with a regex tweak, but what am I missing?