1

I really want to know how web servers convert URL UTF-8 encoded characters to unicode.How do they solve problems such as duplicate URL encoding and non-shortest form utf-8 codes conversion such that explained here.

for example: http://www.example.com/dir1/index.html?name=%D8%A7%D9%84%D8%A7%D8%B3%D9%85%D8%A7

to http://www.example.com/dir1/index.html?name=الاسما

I wrote a c++ program that does this conversion but in general I want to know how web servers like apache or nginx do this.

MSH
  • 429
  • 3
  • 7
  • 20

1 Answers1

1

You meant doing something like this:

From - Encode/Decode URLs in C++

#include <string>
#include <iostream>

using std::string;
using std::cout;
using std::cin;

string urlDecode(string &SRC) {
    string ret;
    char ch;
    int i, ii;
    for (i=0; i<SRC.length(); i++) {
        if (int(SRC[i])=='%') {
            sscanf(SRC.substr(i+1,2).c_str(), "%x", &ii);
            ch=static_cast<char>(ii);
            ret+=ch;
            i=i+2;
        } else {
            ret+=SRC[i];
        }
    }
    return (ret);
}

int main()
{
    string s = "http://www.example.com/dir1/index.html?name=%D8%A7%D9%84%D8%A7%D8%B3%D9%85%D8%A7";
    cout << urlDecode(s);
}
Community
  • 1
  • 1
technusm1
  • 503
  • 5
  • 14
  • And what happens if the string is mal-formed? And what is the magic number `37`? And why the conversion _to_ in in the `if`? And why isn't the argument type `std::string const&`, so that you can pass it a temporary? – James Kanze Jan 15 '15 at 11:30
  • magic number 37 is '%', i.e. if it finds a '%' sign it begins the conversion stuff – technusm1 Jan 15 '15 at 11:46
  • So why obfuscate with `37`? Why not write `'%'`? – James Kanze Jan 15 '15 at 12:31
  • you're right sir. its just that when i found that function on the other page, i thought it wouldn't make much difference to leave 37 as it is. guess it does – technusm1 Jan 15 '15 at 12:35