1

I am trying to encode a request. The request goes as follow:

https://www.overpass-api.de/api/interpreter?data=area["name"="Nicaragua"]["admin_level"="2"]->.boundaryarea;(node["type"="route"]["route"="bus"](area.boundaryarea);way["type"="route"]["route"="bus"](area.boundaryarea);>;relation["type"="route"]["route"="bus"](area.boundaryarea);>>;);out meta;

As you can see, you have a lot of special characters. If I give this URL to curl, I won't process it because of some characters. Hence I decided to encode the URL with my own method and with curl's method. Here is the code sample to encode with curl:

std::string d = ...;
   CURL *curl = curl_easy_init();
if(curl) {
  char *output = curl_easy_escape(curl, d.c_str(), d.length());
  if(output) {
    printf("Encoded: %s\n", output);
    curl_free(output);
  }
}

Will encode the whole request resulting in something like

https%3A%2F%2Fwww.overpass-api.de%2Fapi%2Finterpreter%3Fdata%3D ...

If I then try to give it to curl to process it, it will throw and say that it cannot resolve the host, which makes sense to me. So I then decided to check what chrome does when encoding it - thanks to the dev tools. And this is how it looks like:

https://www.overpass-api.de/api/interpreter?data=area[%22name%22=%22Nicaragua%22][%22admin_level%22=%222%22]-%3E.boundaryarea;(node[%22type%22=%22route%22][%22route%22=%22bus%22](area.boundaryarea);way[%22type%22=%22route%22][%22route%22=%22bus%22](area.boundaryarea);%3E;relation[%22type%22=%22route%22][%22route%22=%22bus%22](area.boundaryarea);%3E%3E;);out%20meta;

And if I give this to curl as it is - it will process it properly.

Why some characters are encoded and not the rest? and why does curl accept it this way ?

EDIT: and more importantly, how can I replicate that in my code?

trexgris
  • 362
  • 3
  • 14
  • The difference between your encoding and chrome's appears to be that "general delimiters" aren't encoded (and don't need to be). https://stackoverflow.com/questions/1856785/characters-allowed-in-a-url – avariant Apr 12 '19 at 21:49

2 Answers2

2

You must escape the URI parts. Have a look at JavaScript's encodeURI() and encode​URIComponent() functions this is the way to go.

I am using following function, which mimics JavaScript's encodeURIComponent, in order to encode the individual parts

std::string encodeURIComponent(std::string const&value)
{
    std::ostringstream oss;
    oss << std::hex;
    for(auto c : value){
      int uc = static_cast<unsigned char>(c);
      if(((0x30 <= uc) && (uc <= 0x39)) || ((0x41 <= uc) && (uc <= 0x5A)) || ((0x61 <= uc) && (uc <= 0x7A))){
        oss << c;
        continue;
      }
      switch(c){
      case '-': oss << c; break;
      case '_': oss << c; break;
      case '.': oss << c; break;
      case '!': oss << c; break;
      case '~': oss << c; break;
      case '*': oss << c; break;
      case '\'': oss << c; break;
      case '(': oss << c; break;
      case ')': oss << c; break;
      default:
          oss << std::uppercase << '%' << std::setw(2) << uc << std::nouppercase;
          break;
      }
    }
    return oss.str();
}
ezegoing
  • 526
  • 1
  • 4
  • 18
1

Do not escape the entire URL as a single string. Escape only the individual pieces that actually need to be escaped, like query parameters. But even then, in name=value pairs, escape the name and value separately as needed, otherwise the delimiting = within the name=value pair, and the delimiting & between pairs, will get escaped, which you don't want to happen.

Try something more like this:

std::string query_encode(const std::string &s)
{
    std::string ret;

    // curl_easy_escape() escapes way more than it needs to in
    // a URL Query component! Which is not TECHNICALLY wrong, but
    // it won't produce the output you are expecting...
    /*
    char *output = curl_easy_escape(curl, s.c_str(), s.length());
    if (output) {
        ret = output;
        curl_free(output);
    }
    */

    #define IS_BETWEEN(ch, low, high) (ch >= low && ch <= high)
    #define IS_ALPHA(ch) (IS_BETWEEN(ch, 'A', 'Z') || IS_BETWEEN(ch, 'a', 'z'))
    #define IS_DIGIT(ch) IS_BETWEEN(ch, '0', '9')
    #define IS_HEXDIG(ch) (IS_DIGIT(ch) || IS_BETWEEN(ch, 'A', 'F') || IS_BETWEEN(ch, 'a', 'f'))

    for(size_t i = 0; i < s.size();)
    {
        char ch = s[i++];

        if (IS_ALPHA(ch) || IS_DIGIT(ch))
        {
            ret += ch;
        }
        else if ((ch == '%') && IS_HEXDIG(s[i+0]) && IS_HEXDIG(s[i+1]))
        {
            ret += s.substr(i-1, 3);
            i += 2;
        }
        else
        {
            switch (ch)
            {
                case '-':
                case '.':
                case '_':
                case '~':
                case '!':
                case '$':
                case '&':
                case '\'':
                case '(':
                case ')':
                case '*':
                case '+':
                case ',':
                case ';':
                case '=':
                case ':':
                case '@':
                case '/':
                case '?':
                case '[':
                case ']':
                    ret += ch;
                    break;

                default:
                {
                    static const char hex[] = "0123456789ABCDEF";
                    char pct[] = "%  ";
                    pct[1] = hex[(ch >> 4) & 0xF];
                    pct[2] = hex[ch & 0xF];
                    ret.append(pct, 3);
                    break;
                }
            }
        }
    }

    return ret;
}

std::string d = "https://www.overpass-api.de/api/interpreter?data=" + query_encode("area[\"name\"=\"Nicaragua\"][\"admin_level\"=\"2\"]->.boundaryarea;(node[\"type\"=\"route\"][\"route\"=\"bus\"](area.boundaryarea);way[\"type\"=\"route\"][\"route\"=\"bus\"](area.boundaryarea);>;relation[\"type\"=\"route\"][\"route\"=\"bus\"](area.boundaryarea);>>;);out meta;");

std::cout << "Encoded: " + d + "\n";

Live Demo

Output:

https://www.overpass-api.de/api/interpreter?data=area[%22name%22=%22Nicaragua%22][%22admin_level%22=%222%22]-%3E.boundaryarea;(node[%22type%22=%22route%22][%22route%22=%22bus%22](area.boundaryarea);way[%22type%22=%22route%22][%22route%22=%22bus%22](area.boundaryarea);%3E;relation[%22type%22=%22route%22][%22route%22=%22bus%22](area.boundaryarea);%3E%3E;);out%20meta;

Why some characters are encoded and not the rest?

The rules are covered by RFC 3986, in particular Section 2 "Characters" and its sub-sections 2.1 - 2.5. The Query component is covered by Section 3.4.

Community
  • 1
  • 1
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770