I've run in to a bit of a problem. I'm making a very super basic script just to see how easy the concept is and I'm really not sure where I should start with it.
My script does the following:
I have an array of words, which will be taken from a DB, but for the sake of this demonstration I've just made it an array with 2 words, "hello" and "goodbye". Normally these words will be words that are considered offensive. My script will replace all occurrences of the words in the array with *s, as to censor them out.
One thing I know quite well, as I use a few games etc. that have a similar system, is that this is easily bypass-able by using characters such as é instead of e. Hello = ***** but Héllo = Hello.
What I'd like to know is this. As I've not done anything regarding UTF-8 encoding, nor do I really know how it works with PHP, is there a way to get all variations of a character? So an e/E with all of the possible accents that exist within UTF-8. If it were ASCII I'd simply make an array with all of the ASCII numbers and work that in to the code, however I've not been able to find a way to do something similar with UTF-8 characters.
My code works fine, so there's no need for me to post it unless somebody asks me to, but what I'd like to achieve is something similar to this, but with UTF-8.
$a = array(65,97);
foreach($a as $x){
echo chr($x) . '<br />';
}
This will, obviously, just show A and a. This, I could work in to my code and replace the words even if they contained these characters as well. Something similar would be awesome if possible.
Cheers guys/gals.
An addition: I would like to achieve this without actually typing the foreign characters in my code. I don't want é etc. in my PHP, I'd like to convert from something, in the same way as my code does above, but obviously not with ASCII; something else.