First of all, there is no invalid UTF-8 characters. There are invalid UTF-8 bytes and byte sequences, which means someone is trying to pull off an encoding attack on your server. These can be validated with mb_check_encoding
on the coming input data, and immediately failing with 400 Bad Request if you don't get valid UTF-8.
What you have is just the SUBSTITUTE control character, a valid character but unprintable.
Originally intended for use as a transmission control character to
indicate that garbled or invalid characters had been received. It has
often been put to use for other purposes when the in-band signaling of
errors it provides is unneeded, especially where robust methods of
error detection and correction are used, or where errors are expected
to be rare enough to make using the character for other purposes
advisable.
You can use this regex to get rid of it (and a few others):
$reg = '/(?![\r\n\t])[\p{Cc}]/u';
preg_replace( $reg, "", $str );