4

My website is fully converted to use utf-8, (mysql, http headers, PHP mb_string etc).

Im doing some penetration testing and trying to POST invalid utf to one of the scripts (using BurpSuite).

But when I post the invalid utf, an just hex-dump the $_POST var, I see that the invalid utf sequence has already been sanitised before I try to validate it using mb_detect_encoding.

This sounds like good news for me, but I want to know which layer is transforming the POST data?

Is it a side-effect of the Content-Type HTTP Header, maybe my webserver is doing it (lighttpd). Or is it PHP itself doing it, when populating $_POST?

I expected to see the invalid utf hexdumped, leaving me to sanitise it myself.

carpii
  • 1,917
  • 4
  • 20
  • 24
  • More information would help, are you on a framework? which PHP version are you using? can we see a sample of your code etc – Kyle Hudson Oct 24 '11 at 00:42
  • Can we see what you posted and what you got back? – Brad Oct 24 '11 at 00:49
  • 1
    No, no framework. Just vanilla PHP and a raw HTTP request from burpsuite, and then the PHP script simply hex-dumps a $_POST["formvalue"]. There's no preprocessing of $_REQUEST/$_POST or any user input in my code prior to hexdumping it -- Sorting out an example now... – carpii Oct 24 '11 at 00:52
  • Simple PHP example, and request/response dumps at http://carpii.homeip.net/utf_test.txt -- initial test is for a 3 byte sequence (a Euro character), where the 3rd byte has been mangled to be invalid – carpii Oct 24 '11 at 01:51

1 Answers1

1

PHP itself does not filter the POST data, it just handles it as binary data which is always "valid" (it's just data, nothing to validate).

I would therefore suspect that there is some module with your webserver that is changing the data or there's some PHP extension that is filtering the data.

Check if you've got a web-firewall installed with your webserver and the list of extensions you're loading with PHP and if there is something input filtering related.

hakre
  • 193,403
  • 52
  • 435
  • 836
  • no framework, no web firewall, apparently lighttpd does not try to filter invalid utf8 either. Im pretty baffled. No strange extensions, although I am working through those. Do you know of any mbstring config settings which could be causing it? – carpii Nov 05 '11 at 01:16
  • mbstring has default encoding, sure. And you might have those registered on input and/or output. See [the **PHP Settings** and **String** sections in this answer](http://stackoverflow.com/q/6987929/367456#6989048). I listed some ini settings that play a role. To really look into the data you process, I often find a [hex dump of PHP strings](http://stackoverflow.com/q/1057572/367456) handy. – hakre Nov 05 '11 at 08:09
  • Thanks, I eventually found that it was being caused by php.ini mbstring setting... mbstring.http_input = auto. When set to auto, it appears to do silent conversion of charset which was giving the impression the invalid utf was being properly sanitisied. I think more likely though, the conversion was failing and returning blank string – carpii Nov 06 '11 at 02:15