(I'll never understand why things like this isn't a simple, nice function, built into PHP, but rather something which has to be individually researched, often incorrectly, and cobbled together by every single individual programmer, but here goes...)
I do the following to "clean" strings (Unicode) coming from users/external sources:
$string = preg_replace('#[[:cntrl:]]#', '', $string); // Removes all "control characters".
$string = preg_replace('#\p{C}+#u', '', $string); // Removes all "invisible" characters. (As if the control ones above aren't invisible?)
Is this enough? Does this take care of all the abuse-able/weird/special Unicode characters? The whole Unicode thing seems to be a dream for people wanting to be malicious. There's so much weird stuff in that huge set of characters, seemingly impossible for any single person to get a grasp of.
Am I missing something? Maybe there is such a built-in function which does what I do, only better and more complete? If not, why is that? It sometimes feels like I'm the only one concerned with security/control whatsoever...