Encode/Decode URLs in C++

Question

Does anyone know of any good C++ code that does this?

How about accepting an answer? – gsamaras Jul 13 '16 at 22:48 — gsamaras, Jul 13 '16 at 22:48

xperroni · Answer 1 · 2022-05-26T11:34:18.297

97

I faced the encoding half of this problem the other day. Unhappy with the available options, and after taking a look at this C sample code, i decided to roll my own C++ url-encode function:

#include <cctype>
#include <iomanip>
#include <sstream>
#include <string>

using namespace std;

string url_encode(const string &value) {
    ostringstream escaped;
    escaped.fill('0');
    escaped << hex;

    for (string::const_iterator i = value.begin(), n = value.end(); i != n; ++i) {
        string::value_type c = (*i);

        // Keep alphanumeric and other accepted characters intact
        if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
            escaped << c;
            continue;
        }

        // Any other characters are percent-encoded
        escaped << uppercase;
        escaped << '%' << setw(2) << int((unsigned char) c);
        escaped << nouppercase;
    }

    return escaped.str();
}

The implementation of the decode function is left as an exercise to the reader. :P

edited May 26 '22 at 11:34

answered Jul 17 '13 at 19:40

xperroni

2,606
1
23
29

1

I believe it's more generic (more generally correct) to replace ' ' with "%20". I've updated the code accordingly; feel free to roll back if you disagree. – Josh Kelley Jul 15 '14 at 17:48
1

Nah, I agree. Also took the chance to remove that pointless `setw(0)` call (at the time I thought minimal width would remain set until I changed it back, but in fact it is reset after the next input). – xperroni Jul 15 '14 at 22:19
Actually, this does not convert '+' to space so it fails. – xryl669 Feb 03 '15 at 08:55
Don't you mean it the other way arond – convert _space to '+'_? Anyway, read the first comment by Josh Kelley, spaces are being converted to '%20' which is just as well. – xperroni Feb 04 '15 at 02:36
1

I had to add std::uppercase to the line "escaped << '%' << std::uppercase << std::setw(2) << int((unsigned char) c);" In case other people are wondering why this returns for example %3a instead of %3A – gmm Sep 11 '15 at 09:06
@gumlym, that is not necessary. According to [RFC 3984](https://tools.ietf.org/html/rfc3986): "The uppercase hexadecimal digits 'A' through 'F' are equivalent to the lowercase digits 'a' through 'f', respectively." – oferei Sep 29 '15 at 14:12
Well, I had to do that when trying to url encode to create a signature for amazon aws, and it didnt work when it returned the downcase, it did work uppercase. That is why I posted it – gmm Sep 29 '15 at 14:21
In fact the RFC also says that "[f]or consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent- encodings." So it is a reasonable change. – xperroni Sep 30 '15 at 05:13
One thing we should add for posterity. This answer handles Unicode! There are many such solutions out there that do not. – Jonathan Henson Feb 19 '16 at 21:58
I don't see the point of checking for "c >= 0" in the conditional though. Characters are coming from a text string, which ought to contain only valid character codes. The only way it would contain negative values was if the string was corrupted, or if an input was deliberately crafted to break the function – and in both cases you might as well _want_ the program to terminate. – xperroni Feb 21 '16 at 01:22
3

It looks wrong because UTF-8 strings are not supported (http://www.w3schools.com/tags/ref_urlencode.asp). It seems to work only for Windows-1252 – Skywalker13 Dec 01 '16 at 16:32
1

The problem was just `isalnum(c)`, it must be changed to `isalnum((unsigned char) c)` – Skywalker13 Dec 01 '16 at 16:44
Notice the character type is already parameterized to the string's `value_type`. If you want to support UTF-8 the correct change is to replace references to `std::string` with e.g. `std::u8string` in C++ 20. – xperroni Dec 20 '18 at 20:15
1

the initialization of the for loop could be replaced with `for (string::value_type c: value)` along with striping the first instruction – Stavros Avramidis Jun 06 '19 at 15:07
Yes, it could. Feel free to add a new answer with your own solution. – xperroni May 26 '22 at 11:38
Do you have a decoder method? – David G Dec 15 '22 at 01:05

score 92 · Answer 2 · edited Aug 20 '23 at 14:40

92

Answering my own question...

libcurl has curl_easy_escape for encoding.

For decoding, curl_easy_unescape.

#include <string>
#include <curl/curl.h>

std::string url_encode(const std::string& decoded)
{
    const auto encoded_value = curl_easy_escape(nullptr, decoded.c_str(), static_cast<int>(decoded.length()));
    std::string result(encoded_value);
    curl_free(encoded_value);
    return result;
}

std::string url_decode(const std::string& encoded)
{
    int output_length;
    const auto decoded_value = curl_easy_unescape(nullptr, encoded.c_str(), static_cast<int>(encoded.length()), &output_length);
    std::string result(decoded_value, output_length);
    curl_free(decoded_value);
    return result;
}

edited Aug 20 '23 at 14:40

BullyWiiPlaza

17,329
10
113
185

answered Sep 30 '08 at 19:41

user126593

2,707
4
21
16

5

You should accept this answer so it is shown at the top (and people can find it easier). – Mouagip Nov 04 '15 at 13:47
you need to use curl for this to work and have to free the memory – xinthose Nov 05 '17 at 01:04
Related question: why does curl's unescape not handle changing '+' to space? Isn't that standard procedure when URL decoding? – Stéphane May 27 '19 at 06:59

score 17 · Answer 3 · edited Sep 09 '22 at 14:18

17

string urlDecode(string &SRC) {
    string ret;
    char ch;
    int i, ii;
    for (i=0; i<SRC.length(); i++) {
        if (SRC[i]=='%') {
            sscanf(SRC.substr(i+1,2).c_str(), "%x", &ii);
            ch=static_cast<char>(ii);
            ret+=ch;
            i=i+2;
        } else {
            ret+=SRC[i];
        }
    }
    return (ret);
}

not the best, but working fine ;-)

edited Sep 09 '22 at 14:18

Olaf73

133
10

answered Jan 28 '11 at 00:52

6

Of course you should use `'%'` instead of `37`. – John Zwinck May 27 '14 at 13:05
6

This does not convert '+' to space – xryl669 Feb 03 '15 at 08:55

score 12 · Answer 4 · answered Aug 15 '14 at 22:36

12

cpp-netlib has functions

namespace boost {
  namespace network {
    namespace uri {    
      inline std::string decoded(const std::string &input);
      inline std::string encoded(const std::string &input);
    }
  }
}

they allow to encode and decode URL strings very easy.

answered Aug 15 '14 at 22:36

Yuriy Petrovskiy

7,888
10
30
34

2

omg thank you. the documentation on cpp-netlib is sparse. Do you have any links to good cheat sheets? – user249806 May 13 '17 at 13:12

score 11 · Answer 5 · edited May 25 '17 at 14:47

[Necromancer mode on]
Stumbled upon this question when was looking for fast, modern, platform independent and elegant solution. Didnt like any of above, cpp-netlib would be the winner but it has horrific memory vulnerability in "decoded" function. So I came up with boost's spirit qi/karma solution.

namespace bsq = boost::spirit::qi;
namespace bk = boost::spirit::karma;
bsq::int_parser<unsigned char, 16, 2, 2> hex_byte;
template <typename InputIterator>
struct unescaped_string
    : bsq::grammar<InputIterator, std::string(char const *)> {
  unescaped_string() : unescaped_string::base_type(unesc_str) {
    unesc_char.add("+", ' ');

    unesc_str = *(unesc_char | "%" >> hex_byte | bsq::char_);
  }

  bsq::rule<InputIterator, std::string(char const *)> unesc_str;
  bsq::symbols<char const, char const> unesc_char;
};

template <typename OutputIterator>
struct escaped_string : bk::grammar<OutputIterator, std::string(char const *)> {
  escaped_string() : escaped_string::base_type(esc_str) {

    esc_str = *(bk::char_("a-zA-Z0-9_.~-") | "%" << bk::right_align(2,0)[bk::hex]);
  }
  bk::rule<OutputIterator, std::string(char const *)> esc_str;
};

The usage of above as following:

std::string unescape(const std::string &input) {
  std::string retVal;
  retVal.reserve(input.size());
  typedef std::string::const_iterator iterator_type;

  char const *start = "";
  iterator_type beg = input.begin();
  iterator_type end = input.end();
  unescaped_string<iterator_type> p;

  if (!bsq::parse(beg, end, p(start), retVal))
    retVal = input;
  return retVal;
}

std::string escape(const std::string &input) {
  typedef std::back_insert_iterator<std::string> sink_type;
  std::string retVal;
  retVal.reserve(input.size() * 3);
  sink_type sink(retVal);
  char const *start = "";

  escaped_string<sink_type> g;
  if (!bk::generate(sink, g(start), input))
    retVal = input;
  return retVal;
}

[Necromancer mode off]

EDIT01: fixed the zero padding stuff - special thanks to Hartmut Kaiser
EDIT02: Live on CoLiRu

What's the “horrific memory vulnerability” of `cpp-netlib`? Can you provide a brief explanation or a link? — Craig M. Brandenburg, Jul 07 '15 at 21:26
It (the problem) was already reported, so I didnt report and actually dont remember... something like access violation when trying to parse invalid escape sequence, or something — kreuzerkrieg, Jul 08 '15 at 14:47
oh, here you go https://github.com/cpp-netlib/cpp-netlib/issues/501 — kreuzerkrieg, Jul 08 '15 at 14:50
I suggest to use uint_parser instead of int_parser. As it is , you would probably accept a - sign — sandwood, Jun 14 '22 at 09:43

score 11 · Answer 6 · edited Sep 15 '15 at 09:00

11

Ordinarily adding '%' to the int value of a char will not work when encoding, the value is supposed to the the hex equivalent. e.g '/' is '%2F' not '%47'.

I think this is the best and concise solutions for both url encoding and decoding (No much header dependencies).

string urlEncode(string str){
    string new_str = "";
    char c;
    int ic;
    const char* chars = str.c_str();
    char bufHex[10];
    int len = strlen(chars);

    for(int i=0;i<len;i++){
        c = chars[i];
        ic = c;
        // uncomment this if you want to encode spaces with +
        /*if (c==' ') new_str += '+';   
        else */if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') new_str += c;
        else {
            sprintf(bufHex,"%X",c);
            if(ic < 16) 
                new_str += "%0"; 
            else
                new_str += "%";
            new_str += bufHex;
        }
    }
    return new_str;
 }

string urlDecode(string str){
    string ret;
    char ch;
    int i, ii, len = str.length();

    for (i=0; i < len; i++){
        if(str[i] != '%'){
            if(str[i] == '+')
                ret += ' ';
            else
                ret += str[i];
        }else{
            sscanf(str.substr(i + 1, 2).c_str(), "%x", &ii);
            ch = static_cast<char>(ii);
            ret += ch;
            i = i + 2;
        }
    }
    return ret;
}

edited Sep 15 '15 at 09:00

reliasn

133
1
6

answered Apr 30 '15 at 07:59

tormuto

587
5
16

`if(ic < 16) new_str += "%0";` What is this catering for?? @tormuto @reliasn – KriyenKP Feb 20 '17 at 12:26
1

@Kriyen it is used to pad the encoded HEX with leading zero in case it results in a single letter; since 0 to 15 in HEX is 0 to F. – tormuto Mar 01 '17 at 23:45
1

I like this approach the best. +1 for using standard libraries. Though there are two issues to fix. I am Czech and used letter "ý". Result was "%0FFFFFFC3%0FFFFFFBD". First using the 16 switch isn't necessary since utf8 guaranties to start all trailing bytes with 10 and it seemed to fail my multibyte. Second issue is the FF because not all computers have the same amount of bits per int. The fix was to skip the 16 switch (not needed) and grabbing the last two chars from the buffer. (I used stringstream since I feel more comfortable with and a string buffer). Still gave point. Like the frame too – Volt Dec 25 '17 at 21:58
@Volt would you be able to post your updated code in a new answer? You mention the issues but it's not enough info for an obvious fix. – gregn3 May 30 '18 at 23:49
This answer has some problems, because it's using strlen. First, this doesn't make sense, because we already know the size of a string object, so it's a waste of time. Much worse though is, that a string may contain 0-bytes, which would get lost because of the strlen. Also the if(i< 16) is ineffecient, because this can be covered by printf itself using "%%%02X". And finally c should be unsigned byte, otherwise you get the effect that @Volt was describing with leading '0xFFF...'. – Devolus Jan 11 '19 at 08:18

score 7 · Answer 7 · answered Sep 30 '08 at 19:27

7

CGICC includes methods to do url encode and decode. form_urlencode and form_urldecode

answered Sep 30 '08 at 19:27

alanc10n

4,897
7
36
41

you just sparked a decent conversation in our office with that library. – J.J. Sep 30 '08 at 19:35
1

This is actually, the simplest and most correct code. – xryl669 Feb 03 '15 at 08:56

kometen · Answer 8 · 2017-11-09T16:19:39.347

Inspired by xperroni I wrote a decoder. Thank you for the pointer.

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

char from_hex(char ch) {
    return isdigit(ch) ? ch - '0' : tolower(ch) - 'a' + 10;
}

string url_decode(string text) {
    char h;
    ostringstream escaped;
    escaped.fill('0');

    for (auto i = text.begin(), n = text.end(); i != n; ++i) {
        string::value_type c = (*i);

        if (c == '%') {
            if (i[1] && i[2]) {
                h = from_hex(i[1]) << 4 | from_hex(i[2]);
                escaped << h;
                i += 2;
            }
        } else if (c == '+') {
            escaped << ' ';
        } else {
            escaped << c;
        }
    }

    return escaped.str();
}

int main(int argc, char** argv) {
    string msg = "J%C3%B8rn!";
    cout << msg << endl;
    string decodemsg = url_decode(msg);
    cout << decodemsg << endl;

    return 0;
}

edit: Removed unneeded cctype and iomainip includes.

The "if (c == '%')" block needs more out-of-bound checking, i[1] and/or i[2] may be beyond text.end(). I would rename "escaped" to "unescaped", too. "escaped.fill('0');" is probably unneeded. — roalz, Mar 23 '18 at 12:54
Please, look at my version. It is more optimised. https://pastebin.com/g0zMLpsj — KoD, Oct 20 '20 at 10:55

score 6 · Answer 9 · edited Mar 19 '23 at 09:51

6

The Windows API has the functions UrlEscape/UrlUnescape, exported by shlwapi.dll, for this task.

edited Mar 19 '23 at 09:51

Andrew Truckle

17,769
16
66
164

answered Oct 23 '14 at 00:47

deltanine

1,166
1
13
25

note: UrlEscape does not encode `+` – Orwellophile Oct 16 '17 at 10:15

score 6 · Answer 10 · answered Jan 04 '12 at 19:31

I ended up on this question when searching for an api to decode url in a win32 c++ app. Since the question doesn't quite specify platform assuming windows isn't a bad thing.

InternetCanonicalizeUrl is the API for windows programs. More info here

        LPTSTR lpOutputBuffer = new TCHAR[1];
        DWORD dwSize = 1;
        BOOL fRes = ::InternetCanonicalizeUrl(strUrl, lpOutputBuffer, &dwSize, ICU_DECODE | ICU_NO_ENCODE);
        DWORD dwError = ::GetLastError();
        if (!fRes && dwError == ERROR_INSUFFICIENT_BUFFER)
        {
            delete lpOutputBuffer;
            lpOutputBuffer = new TCHAR[dwSize];
            fRes = ::InternetCanonicalizeUrl(strUrl, lpOutputBuffer, &dwSize, ICU_DECODE | ICU_NO_ENCODE);
            if (fRes)
            {
                //lpOutputBuffer has decoded url
            }
            else
            {
                //failed to decode
            }
            if (lpOutputBuffer !=NULL)
            {
                delete [] lpOutputBuffer;
                lpOutputBuffer = NULL;
            }
        }
        else
        {
            //some other error OR the input string url is just 1 char and was successfully decoded
        }

InternetCrackUrl (here) also seems to have flags to specify whether to decode url

score 5 · Answer 11 · edited Jun 29 '11 at 21:53

5

Adding a follow-up to Bill's recommendation for using libcurl: great suggestion, and to be updated:
after 3 years, the curl_escape function is deprecated, so for future use it's better to use curl_easy_escape.

edited Jun 29 '11 at 21:53

Tobu

24,771
4
91
98

answered Jun 28 '11 at 22:11

Bagelzone Ha'bonè

1,192
1
14
29

score 4 · Answer 12 · edited Mar 19 '23 at 05:57

4

you can simply use function AtlEscapeUrl() from atlutil.h, just go through its documentation on how to use it.

edited Mar 19 '23 at 05:57

Andrew Truckle

17,769
16
66
164

answered Jan 22 '18 at 10:16

Pratik

125
11

2

this would only work on windows – kritzikratzi Jan 22 '18 at 14:14
Yes I have tried this on windows. – Pratik Jan 23 '18 at 07:06
just what I needed :) – kofifus Dec 21 '22 at 06:33
I tried this but it did not work correctly for me. See: https://stackoverflow.com/q/75781057/2287576 – Andrew Truckle Mar 19 '23 at 09:50

score 3 · Answer 13 · answered Apr 05 '16 at 16:36

3

Another solution is available using Facebook's folly library : folly::uriEscape and folly::uriUnescape.

answered Apr 05 '16 at 16:36

Dalzhim

1,970
1
17
34

jamacoe · Answer 14 · 2021-08-03T05:06:39.040

I couldn't find a URI decode/unescape here that also decodes 2 and 3 byte sequences. Contributing my own version, that on-the-fly converts the c sting input to a wstring:

#include <string>

const char HEX2DEC[55] =
{
     0, 1, 2, 3,  4, 5, 6, 7,  8, 9,-1,-1, -1,-1,-1,-1,
    -1,10,11,12, 13,14,15,-1, -1,-1,-1,-1, -1,-1,-1,-1,
    -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1,
    -1,10,11,12, 13,14,15
};

#define __x2d__(s) HEX2DEC[*(s)-48]
#define __x2d2__(s) __x2d__(s) << 4 | __x2d__(s+1)

std::wstring decodeURI(const char * s) {
    unsigned char b;
    std::wstring ws;
    while (*s) {
        if (*s == '%')
            if ((b = __x2d2__(s + 1)) >= 0x80) {
                if (b >= 0xE0) { // three byte codepoint
                    ws += ((b & 0b00001111) << 12) | ((__x2d2__(s + 4) & 0b00111111) << 6) | (__x2d2__(s + 7) & 0b00111111);
                    s += 9;
                }
                else { // two byte codepoint
                    ws += (__x2d2__(s + 4) & 0b00111111) | (b & 0b00000011) << 6;
                    s += 6;
                }
            }
            else { // one byte codepoints
                ws += b;
                s += 3;
            }
        else { // no %
            ws += *s;
            s++;
        }
    }
    return ws;
}

`#define __x2d2__(s) (__x2d__(s) << 4 | __x2d__(s+1))` and it shall build with -WError. — Janek Olszak, Jun 13 '17 at 15:04
Sorry but "high performance" while adding single chars to a `wstring` is unrealistic. At least `reserve` enough space, otherwise you will have massive reallocations all the time — Felix Dombek, Aug 17 '17 at 21:59

Johan · Answer 15 · 2013-11-09T11:18:58.150

This version is pure C and can optionally normalize the resource path. Using it with C++ is trivial:

#include <string>
#include <iostream>

int main(int argc, char** argv)
{
    const std::string src("/some.url/foo/../bar/%2e/");
    std::cout << "src=\"" << src << "\"" << std::endl;

    // either do it the C++ conformant way:
    char* dst_buf = new char[src.size() + 1];
    urldecode(dst_buf, src.c_str(), 1);
    std::string dst1(dst_buf);
    delete[] dst_buf;
    std::cout << "dst1=\"" << dst1 << "\"" << std::endl;

    // or in-place with the &[0] trick to skip the new/delete
    std::string dst2;
    dst2.resize(src.size() + 1);
    dst2.resize(urldecode(&dst2[0], src.c_str(), 1));
    std::cout << "dst2=\"" << dst2 << "\"" << std::endl;
}

Outputs:

src="/some.url/foo/../bar/%2e/"
dst1="/some.url/bar/"
dst2="/some.url/bar/"

And the actual function:

#include <stddef.h>
#include <ctype.h>

/**
 * decode a percent-encoded C string with optional path normalization
 *
 * The buffer pointed to by @dst must be at least strlen(@src) bytes.
 * Decoding stops at the first character from @src that decodes to null.
 * Path normalization will remove redundant slashes and slash+dot sequences,
 * as well as removing path components when slash+dot+dot is found. It will
 * keep the root slash (if one was present) and will stop normalization
 * at the first questionmark found (so query parameters won't be normalized).
 *
 * @param dst       destination buffer
 * @param src       source buffer
 * @param normalize perform path normalization if nonzero
 * @return          number of valid characters in @dst
 * @author          Johan Lindh <johan@linkdata.se>
 * @legalese        BSD licensed (http://opensource.org/licenses/BSD-2-Clause)
 */
ptrdiff_t urldecode(char* dst, const char* src, int normalize)
{
    char* org_dst = dst;
    int slash_dot_dot = 0;
    char ch, a, b;
    do {
        ch = *src++;
        if (ch == '%' && isxdigit(a = src[0]) && isxdigit(b = src[1])) {
            if (a < 'A') a -= '0';
            else if(a < 'a') a -= 'A' - 10;
            else a -= 'a' - 10;
            if (b < 'A') b -= '0';
            else if(b < 'a') b -= 'A' - 10;
            else b -= 'a' - 10;
            ch = 16 * a + b;
            src += 2;
        }
        if (normalize) {
            switch (ch) {
            case '/':
                if (slash_dot_dot < 3) {
                    /* compress consecutive slashes and remove slash-dot */
                    dst -= slash_dot_dot;
                    slash_dot_dot = 1;
                    break;
                }
                /* fall-through */
            case '?':
                /* at start of query, stop normalizing */
                if (ch == '?')
                    normalize = 0;
                /* fall-through */
            case '\0':
                if (slash_dot_dot > 1) {
                    /* remove trailing slash-dot-(dot) */
                    dst -= slash_dot_dot;
                    /* remove parent directory if it was two dots */
                    if (slash_dot_dot == 3)
                        while (dst > org_dst && *--dst != '/')
                            /* empty body */;
                    slash_dot_dot = (ch == '/') ? 1 : 0;
                    /* keep the root slash if any */
                    if (!slash_dot_dot && dst == org_dst && *dst == '/')
                        ++dst;
                }
                break;
            case '.':
                if (slash_dot_dot == 1 || slash_dot_dot == 2) {
                    ++slash_dot_dot;
                    break;
                }
                /* fall-through */
            default:
                slash_dot_dot = 0;
            }
        }
        *dst++ = ch;
    } while(ch);
    return (dst - org_dst) - 1;
}

Thanks. Here it is without the optional path stuff. http://pastebin.com/RN5g7g9u — Julian, Jun 03 '14 at 04:09
This does not follow any recommandation, and is completely wrong compared to what the author asks for ('+' is not replaced by space for example). Path normalization has nothing to do with url decoding. If you intent to normalize your path, you should first split your URL in parts (scheme, authority, path, query, fragment) and then apply whatever algorithm you like only on the path part. — xryl669, Feb 03 '15 at 09:04

score 1 · Answer 16 · answered May 28 '15 at 07:03

the juicy bits

#include <ctype.h> // isdigit, tolower

from_hex(char ch) {
  return isdigit(ch) ? ch - '0' : tolower(ch) - 'a' + 10;
}

char to_hex(char code) {
  static char hex[] = "0123456789abcdef";
  return hex[code & 15];
}

noting that

char d = from_hex(hex[0]) << 4 | from_hex(hex[1]);

as in

// %7B = '{'

char d = from_hex('7') << 4 | from_hex('B');

score 1 · Answer 17 · answered Mar 11 '16 at 13:39

You can use "g_uri_escape_string()" function provided glib.h. https://developer.gnome.org/glib/stable/glib-URI-Functions.html

#include <stdio.h>
#include <stdlib.h>
#include <glib.h>
int main() {
    char *uri = "http://www.example.com?hello world";
    char *encoded_uri = NULL;
    //as per wiki (https://en.wikipedia.org/wiki/Percent-encoding)
    char *escape_char_str = "!*'();:@&=+$,/?#[]"; 
    encoded_uri = g_uri_escape_string(uri, escape_char_str, TRUE);
    printf("[%s]\n", encoded_uri);
    free(encoded_uri);

    return 0;
}

compile it with:

gcc encoding_URI.c `pkg-config --cflags --libs glib-2.0`

score 0 · Answer 18 · edited Jun 27 '16 at 12:23

I know the question asks for a C++ method, but for those who might need it, I came up with a very short function in plain C to encode a string. It doesn't create a new string, rather it alters the existing one, meaning that it must have enough size to hold the new string. Very easy to keep up.

void urlEncode(char *string)
{
    char charToEncode;
    int posToEncode;
    while (((posToEncode=strspn(string,"1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_.~"))!=0) &&(posToEncode<strlen(string)))
    {
        charToEncode=string[posToEncode];
        memmove(string+posToEncode+3,string+posToEncode+1,strlen(string+posToEncode));
        string[posToEncode]='%';
        string[posToEncode+1]="0123456789ABCDEF"[charToEncode>>4];
        string[posToEncode+2]="0123456789ABCDEF"[charToEncode&0xf];
        string+=posToEncode+3;
    }
}

score -2 · Answer 19 · answered Feb 04 '15 at 16:27

Had to do it in a project without Boost. So, ended up writing my own. I will just put it on GitHub: https://github.com/corporateshark/LUrlParser

clParseURL URL = clParseURL::ParseURL( "https://name:pwd@github.com:80/path/res" );

if ( URL.IsValid() )
{
    cout << "Scheme    : " << URL.m_Scheme << endl;
    cout << "Host      : " << URL.m_Host << endl;
    cout << "Port      : " << URL.m_Port << endl;
    cout << "Path      : " << URL.m_Path << endl;
    cout << "Query     : " << URL.m_Query << endl;
    cout << "Fragment  : " << URL.m_Fragment << endl;
    cout << "User name : " << URL.m_UserName << endl;
    cout << "Password  : " << URL.m_Password << endl;
}

Your link is to a library which parses a URL. It does not %-encode a URL. (Or at least, I couldn't see a % anywhere in the source.) As such, I don't think this answers the question. — Martin Bonner supports Monica, Nov 20 '15 at 13:23

Encode/Decode URLs in C++

19 Answers19

Linked

Related