2

I recently learned that overlong encodings cause a security risk when not properly validated. From the answer in the previously mentioned post:

For example the character < is usually represented as byte 0x3C, but could also be represented using the overlong UTF-8 sequence 0xC0 0xBC (or even more redundant 3- or 4-byte sequences).

And:

If you take this input and handle it in a Unicode-oblivious byte-based tool, then any character processing step being used in that tool may be evaded.

Meaning that if I use htmlspecialchars on a string that uses overlong encoding, then the output could still contain tags. I also assume that you could post similar characters (like " or ;) which could also be used for SQL injections.

Perhaps it is me, but I believe that this is a security risk relatively few people take into account and even know about. I've been coding for years and am only now finding this out.

Anyway, my question is: what tools can I use to send data with overlong encodings? People who are familiar with this risk: how do you perform tests on websites? I want to POST a bunch of overlong characters to my sites, but I have no idea how to do this.

In my situation I mostly use PHP and MySQL, but what I really want to know are testing tools, so I guess the back-end situation does not matter much.

Community
  • 1
  • 1

2 Answers2

3

I want to POST a bunch of overlong characters to my sites, but I have no idea how to do this.

Apart from testing it with manual request tools like curl, a simple workaround for in-browser testing is to override the encoding of the form submission. Using eg Firebug/Chrome Debugger, alter the form you're testing to add the attribute:

accept-charset="iso-8859-1"

You can now type characters that, when encoded as Windows code page 1252(*), become the UTF-8 overlong byte sequence you want.

For example, enter café into the form and you will get the byte sequence c a f 0xC3 0xA9 so the application will think you typed café. Enter À¼foo and the sequence 0xC0 0xBC f o o will be submitted, which could be interpreted as <foo. Note that you won't see <foo in any output page source because modern browsers don't parse overlong UTF-8 sequences in web pages, but you might get a �foo or other indication something isn't right.

For more in-depth access to doctor the input and check the output of a webapp, see dedicated sec tools like Burp.

bobince
  • 528,062
  • 107
  • 651
  • 834
  • oops forgot footnote! \*: although it *says* `iso-8859-1`, web browsers treat this encoding as really meaning `windows-1252`, for tedious historical reasons. They are confusingly similar encodings but not the same. – bobince Apr 10 '13 at 19:48
0

To test if your site is vulnerable use curl to fets your page using post and the encoding to the utf8 long and post utf8 long encoded information(you could use your text editor for this by setting the text editor encoding to utf8 long so the text you post using curl and the php file is in long)

http://php.net/manual/en/function.curl-setopt.php

Tschallacka
  • 27,901
  • 14
  • 88
  • 133