I am developing my first ever site that needs multilingual support so I've been researching how to get PHP/MySQL/Apache & browser to cooperate. So far so good.
I've even got it so that it works against all checks I throw at it (from db, to db, php file encoding, apache adddefaultcharset, pdo connection string, php mb string functions, ini settings for different php versions, etc etc etc).
I had to add accept-charset="utf-8"
to POST forms, though. Though all browsers I tested did in fact match the charset of the page supplied, they also provide a tool for a user to manually select the character set. While this is terrible for someone like me just learning i18n using utf8, I'm happy to report that every browser I've tried has respected my accept-charset request.
However, this made me think.. what if a particular browser DIDN'T? Then there's GET vars and cookie vars. What if another encoding got through somehow (better safe than sorry?)?
So, I have a question. :)
Would there be any reason to advise against a function like the following pseudo-code, for just in case purposes, and injecting it at the top of a global include?
//mb_detect_encoding detects the utf-8 with everything I try to throw at it, so it seems reliable //in pseudo-code, a recursive calling function like so... foreach (get, post, cookie AS superglobal) { foreach (superglobal AS key => value) { //if array, call self recursively, otherwise parse value if (mb_detect_encoding(value) != 'utf-8') { unset(superglobal[key]); } } }
This way, as a last resort, if some other encoding made it through somehow, this function would take the data out.
I don't see the harm as this function won't see much action resulting from the check. I could also attempt to utf8_encode()
before throwing it away in the event that something did make it through. Thoughts? Can any bad come from this? Am I being TOO paranoid?
I just want to be careful because I read character encoding attacks are possible.
EDIT: I guess I could array_map()
or array_walk_recursive()
or something else. The implementation doesn't matter; the idea is important.