0

I am developing my first ever site that needs multilingual support so I've been researching how to get PHP/MySQL/Apache & browser to cooperate. So far so good.

I've even got it so that it works against all checks I throw at it (from db, to db, php file encoding, apache adddefaultcharset, pdo connection string, php mb string functions, ini settings for different php versions, etc etc etc).

I had to add accept-charset="utf-8" to POST forms, though. Though all browsers I tested did in fact match the charset of the page supplied, they also provide a tool for a user to manually select the character set. While this is terrible for someone like me just learning i18n using utf8, I'm happy to report that every browser I've tried has respected my accept-charset request.

However, this made me think.. what if a particular browser DIDN'T? Then there's GET vars and cookie vars. What if another encoding got through somehow (better safe than sorry?)?

So, I have a question. :)

  • Would there be any reason to advise against a function like the following pseudo-code, for just in case purposes, and injecting it at the top of a global include?

    //mb_detect_encoding detects the utf-8 with everything I try to throw at it, so it seems reliable
    //in pseudo-code, a recursive calling function like so...
    foreach (get, post, cookie AS superglobal) {
         foreach (superglobal AS key => value) {
            //if array, call self recursively, otherwise parse value
            if (mb_detect_encoding(value) != 'utf-8') {
                unset(superglobal[key]);
            }
         }
    }
    

This way, as a last resort, if some other encoding made it through somehow, this function would take the data out.

I don't see the harm as this function won't see much action resulting from the check. I could also attempt to utf8_encode() before throwing it away in the event that something did make it through. Thoughts? Can any bad come from this? Am I being TOO paranoid?

I just want to be careful because I read character encoding attacks are possible.

EDIT: I guess I could array_map() or array_walk_recursive() or something else. The implementation doesn't matter; the idea is important.

David Ferenczy Rogožan
  • 23,966
  • 9
  • 79
  • 68
Scott
  • 61
  • 1
  • 7
  • You do have the meta tag with UTF-8, yes? – Rick James Dec 18 '15 at 06:41
  • I vote for "too paranoid". How many _minutes_ would such a browser last? – Rick James Dec 18 '15 at 06:42
  • I do have the meta tag, and php content type header. My unicode setup is 100 percent working. I read however that problems can arise if another encoding made its way through. Not only data integrity but also unicode attacks. I guess i should probably read up on that. @rick james - custom browsers do exist by people who are looking to attack. – Scott Dec 18 '15 at 23:03
  • I would be interested to hear of "unicode attacks". – Rick James Dec 19 '15 at 01:03
  • @rick - such as this http://stackoverflow.com/questions/13022369/php-security-how-can-encoding-be-misused and I've read about something else it seems where an attacker would submit an invalid encoding in hopes the app didn't detect it (have to read more) – Scott Dec 19 '15 at 01:16

0 Answers0