My website is fully converted to use utf-8, (mysql, http headers, PHP mb_string etc).
Im doing some penetration testing and trying to POST invalid utf to one of the scripts (using BurpSuite).
But when I post the invalid utf, an just hex-dump the $_POST var, I see that the invalid utf sequence has already been sanitised before I try to validate it using mb_detect_encoding.
This sounds like good news for me, but I want to know which layer is transforming the POST data?
Is it a side-effect of the Content-Type HTTP Header, maybe my webserver is doing it (lighttpd). Or is it PHP itself doing it, when populating $_POST?
I expected to see the invalid utf hexdumped, leaving me to sanitise it myself.