Un-Escape String received via Post Data

Question

I am writing a program, that has a small, self-written HTTP Server inside. Now i get Data via POST over the socket. My problem is, how do I unescape the String the best way in C++? I get Data like:

command=foo%26bar

but i want it to be

command=foo&bar

Whats the best way to achieve this in C++?

EDIT: If someone is interested in my solution, here it is:

void HttpServer::UnescapePostData(std::string & data) {
    size_t pos;    
    while ((pos = data.find("+")) != std::string::npos) {
        data.replace(pos, 1, " ");
    }
    while ((pos = data.find("%")) != std::string::npos) {
        if (pos <= data.length() - 3) {
            char replace[2] = {(char)(std::stoi("0x" + data.substr(pos+1,2), NULL, 16)), '\0'};
            data.replace(pos, 3, replace);
        }
    }
}

That looks like **un**-escaping. Also, beware of wild polar bears. — Bartek Banachewicz, Dec 17 '14 at 11:57
Additionally, your code has an obvious bug. If you %-escape the % character itself, and in the original string the % character is followed by two hexadecimal character, your decoding will be wrong. — Sam Varshavchik, Dec 20 '14 at 01:37
Indeed, you are right! Need to fix it that data.find will start searching after the last replaced %... Thank you! — Nidhoegger, Dec 20 '14 at 18:11

score 3 · Accepted Answer · answered Dec 17 '14 at 12:10

3

Well, there is no formal definition of the right terminology, but this kind of process is generally describing as "unescaping", or "parsing" rather than escaping. You would like to parse the application/x-www-form-urlencoded-encoded string.

And the answer is rather boring: you just do it. That's all. application/x/www-form-urlencoded only does two things: replace spaces with "+" signs, and replace most other kind of punctuation (including the real "+" sign itself) with %xx, where xx is the octet in hexadecimal.

So, you just roll up your sleeves, and do it. Scan the string, replace the + character with a space, and replace each occurence of %xx with the single character, the evaluated hexadecimal octet. There's nothing particularly mysterious about the process. It is exactly what it appears to be.

answered Dec 17 '14 at 12:10

Sam Varshavchik

114,536
5
94
148

Thats what i suspected. I was just wondering if there is an easier (or comfortable) way to achieve this. – Nidhoegger Dec 17 '14 at 14:29
I just looked up, and in my library, decoding an application/x-www-form-urlencoded string takes exactly 74 lines of code, and it takes that long only because it's a template-based algorithm, usable for decoding arbitrary x-www-form-urlencoded input sequence, rather than one hand-tailored for a specific container (it could probably be done in about 10 lines of std::string-specific code). If instead of asking this, 11 hours ago, you would've started working on it, I'm sure you would've been finished by now, and have working code that you can use. – Sam Varshavchik Dec 18 '14 at 02:12
Look in the first post dude. – Nidhoegger Dec 19 '14 at 13:52

Un-Escape String received via Post Data

1 Answers1