To match these end of sentence punctuation marks if they are followed with whitespace and a lowercase letter, use
'~\w+[.?!]+\s+(?=\p{Ll})~u'
See the regex demo
Explanation:
\w+
- 1+ alphanumeric/underscore symbols
[.?!]+
- 1+ literal .
, ?
or !
\s+
- 1+ whitespace symbols...
(?=\p{Ll})
- followed with 1+ whitespace characters followed with a lowercase letter (see Unicode character properties for \p{Ll}
details and more Unicode category classes).
In PHP, use the /u
modifier since you are working with Unicode strings.
Here is a PHP code demo:
$re = '~\w+[.?!]+\s+(?=\p{Ll})~u';
$arr = array("Howdy world? lorem", "Howdy world... lorem", "Howdy world? lorem", "What is reality. howdy ",
"Howdy you. Lorem ", "Howdy you. 進撃の ");
print_r(preg_grep($re, $arr));
// => Array([0] => Howdy world? lorem [1] => Howdy world... lorem
//[2] => Howdy world? lorem [3] => What is reality. howdy )