I recently learned that overlong encodings cause a security risk when not properly validated. From the answer in the previously mentioned post:
For example the character < is usually represented as byte 0x3C, but could also be represented using the overlong UTF-8 sequence 0xC0 0xBC (or even more redundant 3- or 4-byte sequences).
And:
If you take this input and handle it in a Unicode-oblivious byte-based tool, then any character processing step being used in that tool may be evaded.
Meaning that if I use htmlspecialchars
on a string that uses overlong encoding, then the output could still contain tags. I also assume that you could post similar characters (like "
or ;
) which could also be used for SQL injections.
Perhaps it is me, but I believe that this is a security risk relatively few people take into account and even know about. I've been coding for years and am only now finding this out.
Anyway, my question is: what tools can I use to send data with overlong encodings? People who are familiar with this risk: how do you perform tests on websites? I want to POST a bunch of overlong characters to my sites, but I have no idea how to do this.
In my situation I mostly use PHP and MySQL, but what I really want to know are testing tools, so I guess the back-end situation does not matter much.