Does anyone know of any good C++ code that does this?
-
7How about accepting an answer? – gsamaras Jul 13 '16 at 22:48
19 Answers
I faced the encoding half of this problem the other day. Unhappy with the available options, and after taking a look at this C sample code, i decided to roll my own C++ url-encode function:
#include <cctype>
#include <iomanip>
#include <sstream>
#include <string>
using namespace std;
string url_encode(const string &value) {
ostringstream escaped;
escaped.fill('0');
escaped << hex;
for (string::const_iterator i = value.begin(), n = value.end(); i != n; ++i) {
string::value_type c = (*i);
// Keep alphanumeric and other accepted characters intact
if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
escaped << c;
continue;
}
// Any other characters are percent-encoded
escaped << uppercase;
escaped << '%' << setw(2) << int((unsigned char) c);
escaped << nouppercase;
}
return escaped.str();
}
The implementation of the decode function is left as an exercise to the reader. :P

- 2,606
- 1
- 23
- 29
-
1I believe it's more generic (more generally correct) to replace ' ' with "%20". I've updated the code accordingly; feel free to roll back if you disagree. – Josh Kelley Jul 15 '14 at 17:48
-
1Nah, I agree. Also took the chance to remove that pointless `setw(0)` call (at the time I thought minimal width would remain set until I changed it back, but in fact it is reset after the next input). – xperroni Jul 15 '14 at 22:19
-
-
Don't you mean it the other way arond – convert _space to '+'_? Anyway, read the first comment by Josh Kelley, spaces are being converted to '%20' which is just as well. – xperroni Feb 04 '15 at 02:36
-
1I had to add std::uppercase to the line "escaped << '%' << std::uppercase << std::setw(2) << int((unsigned char) c);" In case other people are wondering why this returns for example %3a instead of %3A – gmm Sep 11 '15 at 09:06
-
@gumlym, that is not necessary. According to [RFC 3984](https://tools.ietf.org/html/rfc3986): "The uppercase hexadecimal digits 'A' through 'F' are equivalent to the lowercase digits 'a' through 'f', respectively." – oferei Sep 29 '15 at 14:12
-
Well, I had to do that when trying to url encode to create a signature for amazon aws, and it didnt work when it returned the downcase, it did work uppercase. That is why I posted it – gmm Sep 29 '15 at 14:21
-
In fact the RFC also says that "[f]or consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent- encodings." So it is a reasonable change. – xperroni Sep 30 '15 at 05:13
-
One thing we should add for posterity. This answer handles Unicode! There are many such solutions out there that do not. – Jonathan Henson Feb 19 '16 at 21:58
-
I don't see the point of checking for "c >= 0" in the conditional though. Characters are coming from a text string, which ought to contain only valid character codes. The only way it would contain negative values was if the string was corrupted, or if an input was deliberately crafted to break the function – and in both cases you might as well _want_ the program to terminate. – xperroni Feb 21 '16 at 01:22
-
3It looks wrong because UTF-8 strings are not supported (http://www.w3schools.com/tags/ref_urlencode.asp). It seems to work only for Windows-1252 – Skywalker13 Dec 01 '16 at 16:32
-
1The problem was just `isalnum(c)`, it must be changed to `isalnum((unsigned char) c)` – Skywalker13 Dec 01 '16 at 16:44
-
Notice the character type is already parameterized to the string's `value_type`. If you want to support UTF-8 the correct change is to replace references to `std::string` with e.g. `std::u8string` in C++ 20. – xperroni Dec 20 '18 at 20:15
-
1the initialization of the for loop could be replaced with `for (string::value_type c: value)` along with striping the first instruction – Stavros Avramidis Jun 06 '19 at 15:07
-
-
Answering my own question...
libcurl has curl_easy_escape for encoding.
For decoding, curl_easy_unescape.
#include <string>
#include <curl/curl.h>
std::string url_encode(const std::string& decoded)
{
const auto encoded_value = curl_easy_escape(nullptr, decoded.c_str(), static_cast<int>(decoded.length()));
std::string result(encoded_value);
curl_free(encoded_value);
return result;
}
std::string url_decode(const std::string& encoded)
{
int output_length;
const auto decoded_value = curl_easy_unescape(nullptr, encoded.c_str(), static_cast<int>(encoded.length()), &output_length);
std::string result(decoded_value, output_length);
curl_free(decoded_value);
return result;
}

- 17,329
- 10
- 113
- 185

- 2,707
- 4
- 21
- 16
-
5You should accept this answer so it is shown at the top (and people can find it easier). – Mouagip Nov 04 '15 at 13:47
-
-
Related question: why does curl's unescape not handle changing '+' to space? Isn't that standard procedure when URL decoding? – Stéphane May 27 '19 at 06:59
string urlDecode(string &SRC) {
string ret;
char ch;
int i, ii;
for (i=0; i<SRC.length(); i++) {
if (SRC[i]=='%') {
sscanf(SRC.substr(i+1,2).c_str(), "%x", &ii);
ch=static_cast<char>(ii);
ret+=ch;
i=i+2;
} else {
ret+=SRC[i];
}
}
return (ret);
}
not the best, but working fine ;-)

- 133
- 10
cpp-netlib has functions
namespace boost {
namespace network {
namespace uri {
inline std::string decoded(const std::string &input);
inline std::string encoded(const std::string &input);
}
}
}
they allow to encode and decode URL strings very easy.

- 7,888
- 10
- 30
- 34
-
2omg thank you. the documentation on cpp-netlib is sparse. Do you have any links to good cheat sheets? – user249806 May 13 '17 at 13:12
[Necromancer mode on]
Stumbled upon this question when was looking for fast, modern, platform independent and elegant solution. Didnt like any of above, cpp-netlib would be the winner but it has horrific memory vulnerability in "decoded" function. So I came up with boost's spirit qi/karma solution.
namespace bsq = boost::spirit::qi;
namespace bk = boost::spirit::karma;
bsq::int_parser<unsigned char, 16, 2, 2> hex_byte;
template <typename InputIterator>
struct unescaped_string
: bsq::grammar<InputIterator, std::string(char const *)> {
unescaped_string() : unescaped_string::base_type(unesc_str) {
unesc_char.add("+", ' ');
unesc_str = *(unesc_char | "%" >> hex_byte | bsq::char_);
}
bsq::rule<InputIterator, std::string(char const *)> unesc_str;
bsq::symbols<char const, char const> unesc_char;
};
template <typename OutputIterator>
struct escaped_string : bk::grammar<OutputIterator, std::string(char const *)> {
escaped_string() : escaped_string::base_type(esc_str) {
esc_str = *(bk::char_("a-zA-Z0-9_.~-") | "%" << bk::right_align(2,0)[bk::hex]);
}
bk::rule<OutputIterator, std::string(char const *)> esc_str;
};
The usage of above as following:
std::string unescape(const std::string &input) {
std::string retVal;
retVal.reserve(input.size());
typedef std::string::const_iterator iterator_type;
char const *start = "";
iterator_type beg = input.begin();
iterator_type end = input.end();
unescaped_string<iterator_type> p;
if (!bsq::parse(beg, end, p(start), retVal))
retVal = input;
return retVal;
}
std::string escape(const std::string &input) {
typedef std::back_insert_iterator<std::string> sink_type;
std::string retVal;
retVal.reserve(input.size() * 3);
sink_type sink(retVal);
char const *start = "";
escaped_string<sink_type> g;
if (!bk::generate(sink, g(start), input))
retVal = input;
return retVal;
}
[Necromancer mode off]
EDIT01: fixed the zero padding stuff - special thanks to Hartmut Kaiser
EDIT02: Live on CoLiRu

- 9,724
- 1
- 23
- 15

- 3,009
- 3
- 28
- 59
-
What's the “horrific memory vulnerability” of `cpp-netlib`? Can you provide a brief explanation or a link? – Craig M. Brandenburg Jul 07 '15 at 21:26
-
It (the problem) was already reported, so I didnt report and actually dont remember... something like access violation when trying to parse invalid escape sequence, or something – kreuzerkrieg Jul 08 '15 at 14:47
-
oh, here you go https://github.com/cpp-netlib/cpp-netlib/issues/501 – kreuzerkrieg Jul 08 '15 at 14:50
-
-
I suggest to use uint_parser instead of int_parser. As it is , you would probably accept a - sign – sandwood Jun 14 '22 at 09:43
Ordinarily adding '%' to the int value of a char will not work when encoding, the value is supposed to the the hex equivalent. e.g '/' is '%2F' not '%47'.
I think this is the best and concise solutions for both url encoding and decoding (No much header dependencies).
string urlEncode(string str){
string new_str = "";
char c;
int ic;
const char* chars = str.c_str();
char bufHex[10];
int len = strlen(chars);
for(int i=0;i<len;i++){
c = chars[i];
ic = c;
// uncomment this if you want to encode spaces with +
/*if (c==' ') new_str += '+';
else */if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') new_str += c;
else {
sprintf(bufHex,"%X",c);
if(ic < 16)
new_str += "%0";
else
new_str += "%";
new_str += bufHex;
}
}
return new_str;
}
string urlDecode(string str){
string ret;
char ch;
int i, ii, len = str.length();
for (i=0; i < len; i++){
if(str[i] != '%'){
if(str[i] == '+')
ret += ' ';
else
ret += str[i];
}else{
sscanf(str.substr(i + 1, 2).c_str(), "%x", &ii);
ch = static_cast<char>(ii);
ret += ch;
i = i + 2;
}
}
return ret;
}
-
`if(ic < 16) new_str += "%0";` What is this catering for?? @tormuto @reliasn – KriyenKP Feb 20 '17 at 12:26
-
1@Kriyen it is used to pad the encoded HEX with leading zero in case it results in a single letter; since 0 to 15 in HEX is 0 to F. – tormuto Mar 01 '17 at 23:45
-
1I like this approach the best. +1 for using standard libraries. Though there are two issues to fix. I am Czech and used letter "ý". Result was "%0FFFFFFC3%0FFFFFFBD". First using the 16 switch isn't necessary since utf8 guaranties to start all trailing bytes with 10 and it seemed to fail my multibyte. Second issue is the FF because not all computers have the same amount of bits per int. The fix was to skip the 16 switch (not needed) and grabbing the last two chars from the buffer. (I used stringstream since I feel more comfortable with and a string buffer). Still gave point. Like the frame too – Volt Dec 25 '17 at 21:58
-
@Volt would you be able to post your updated code in a new answer? You mention the issues but it's not enough info for an obvious fix. – gregn3 May 30 '18 at 23:49
-
This answer has some problems, because it's using strlen. First, this doesn't make sense, because we already know the size of a string object, so it's a waste of time. Much worse though is, that a string may contain 0-bytes, which would get lost because of the strlen. Also the if(i< 16) is ineffecient, because this can be covered by printf itself using "%%%02X". And finally c should be unsigned byte, otherwise you get the effect that @Volt was describing with leading '0xFFF...'. – Devolus Jan 11 '19 at 08:18
CGICC includes methods to do url encode and decode. form_urlencode and form_urldecode

- 4,897
- 7
- 36
- 41
Inspired by xperroni I wrote a decoder. Thank you for the pointer.
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
char from_hex(char ch) {
return isdigit(ch) ? ch - '0' : tolower(ch) - 'a' + 10;
}
string url_decode(string text) {
char h;
ostringstream escaped;
escaped.fill('0');
for (auto i = text.begin(), n = text.end(); i != n; ++i) {
string::value_type c = (*i);
if (c == '%') {
if (i[1] && i[2]) {
h = from_hex(i[1]) << 4 | from_hex(i[2]);
escaped << h;
i += 2;
}
} else if (c == '+') {
escaped << ' ';
} else {
escaped << c;
}
}
return escaped.str();
}
int main(int argc, char** argv) {
string msg = "J%C3%B8rn!";
cout << msg << endl;
string decodemsg = url_decode(msg);
cout << decodemsg << endl;
return 0;
}
edit: Removed unneeded cctype and iomainip includes.

- 6,536
- 6
- 41
- 51
-
1The "if (c == '%')" block needs more out-of-bound checking, i[1] and/or i[2] may be beyond text.end(). I would rename "escaped" to "unescaped", too. "escaped.fill('0');" is probably unneeded. – roalz Mar 23 '18 at 12:54
-
Please, look at my version. It is more optimised. https://pastebin.com/g0zMLpsj – KoD Oct 20 '20 at 10:55
The Windows API has the functions UrlEscape
/UrlUnescape
, exported by shlwapi.dll, for this task.

- 17,769
- 16
- 66
- 164

- 1,166
- 1
- 13
- 25
I ended up on this question when searching for an api to decode url in a win32 c++ app. Since the question doesn't quite specify platform assuming windows isn't a bad thing.
InternetCanonicalizeUrl is the API for windows programs. More info here
LPTSTR lpOutputBuffer = new TCHAR[1];
DWORD dwSize = 1;
BOOL fRes = ::InternetCanonicalizeUrl(strUrl, lpOutputBuffer, &dwSize, ICU_DECODE | ICU_NO_ENCODE);
DWORD dwError = ::GetLastError();
if (!fRes && dwError == ERROR_INSUFFICIENT_BUFFER)
{
delete lpOutputBuffer;
lpOutputBuffer = new TCHAR[dwSize];
fRes = ::InternetCanonicalizeUrl(strUrl, lpOutputBuffer, &dwSize, ICU_DECODE | ICU_NO_ENCODE);
if (fRes)
{
//lpOutputBuffer has decoded url
}
else
{
//failed to decode
}
if (lpOutputBuffer !=NULL)
{
delete [] lpOutputBuffer;
lpOutputBuffer = NULL;
}
}
else
{
//some other error OR the input string url is just 1 char and was successfully decoded
}
InternetCrackUrl (here) also seems to have flags to specify whether to decode url

- 1,358
- 13
- 14
Adding a follow-up to Bill's recommendation for using libcurl: great suggestion, and to be updated:
after 3 years, the curl_escape function is deprecated, so for future use it's better to use curl_easy_escape.

- 24,771
- 4
- 91
- 98

- 1,192
- 1
- 14
- 29
you can simply use function AtlEscapeUrl()
from atlutil.h
, just go through its documentation on how to use it.

- 17,769
- 16
- 66
- 164

- 125
- 11
-
2
-
-
-
I tried this but it did not work correctly for me. See: https://stackoverflow.com/q/75781057/2287576 – Andrew Truckle Mar 19 '23 at 09:50
Another solution is available using Facebook's folly library : folly::uriEscape
and folly::uriUnescape
.

- 1,970
- 1
- 17
- 34
I couldn't find a URI decode/unescape here that also decodes 2 and 3 byte sequences. Contributing my own version, that on-the-fly converts the c sting input to a wstring:
#include <string>
const char HEX2DEC[55] =
{
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,-1,-1, -1,-1,-1,-1,
-1,10,11,12, 13,14,15,-1, -1,-1,-1,-1, -1,-1,-1,-1,
-1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1, -1,-1,-1,-1,
-1,10,11,12, 13,14,15
};
#define __x2d__(s) HEX2DEC[*(s)-48]
#define __x2d2__(s) __x2d__(s) << 4 | __x2d__(s+1)
std::wstring decodeURI(const char * s) {
unsigned char b;
std::wstring ws;
while (*s) {
if (*s == '%')
if ((b = __x2d2__(s + 1)) >= 0x80) {
if (b >= 0xE0) { // three byte codepoint
ws += ((b & 0b00001111) << 12) | ((__x2d2__(s + 4) & 0b00111111) << 6) | (__x2d2__(s + 7) & 0b00111111);
s += 9;
}
else { // two byte codepoint
ws += (__x2d2__(s + 4) & 0b00111111) | (b & 0b00000011) << 6;
s += 6;
}
}
else { // one byte codepoints
ws += b;
s += 3;
}
else { // no %
ws += *s;
s++;
}
}
return ws;
}

- 519
- 4
- 16
-
`#define __x2d2__(s) (__x2d__(s) << 4 | __x2d__(s+1))` and it shall build with -WError. – Janek Olszak Jun 13 '17 at 15:04
-
2Sorry but "high performance" while adding single chars to a `wstring` is unrealistic. At least `reserve` enough space, otherwise you will have massive reallocations all the time – Felix Dombek Aug 17 '17 at 21:59
This version is pure C and can optionally normalize the resource path. Using it with C++ is trivial:
#include <string>
#include <iostream>
int main(int argc, char** argv)
{
const std::string src("/some.url/foo/../bar/%2e/");
std::cout << "src=\"" << src << "\"" << std::endl;
// either do it the C++ conformant way:
char* dst_buf = new char[src.size() + 1];
urldecode(dst_buf, src.c_str(), 1);
std::string dst1(dst_buf);
delete[] dst_buf;
std::cout << "dst1=\"" << dst1 << "\"" << std::endl;
// or in-place with the &[0] trick to skip the new/delete
std::string dst2;
dst2.resize(src.size() + 1);
dst2.resize(urldecode(&dst2[0], src.c_str(), 1));
std::cout << "dst2=\"" << dst2 << "\"" << std::endl;
}
Outputs:
src="/some.url/foo/../bar/%2e/"
dst1="/some.url/bar/"
dst2="/some.url/bar/"
And the actual function:
#include <stddef.h>
#include <ctype.h>
/**
* decode a percent-encoded C string with optional path normalization
*
* The buffer pointed to by @dst must be at least strlen(@src) bytes.
* Decoding stops at the first character from @src that decodes to null.
* Path normalization will remove redundant slashes and slash+dot sequences,
* as well as removing path components when slash+dot+dot is found. It will
* keep the root slash (if one was present) and will stop normalization
* at the first questionmark found (so query parameters won't be normalized).
*
* @param dst destination buffer
* @param src source buffer
* @param normalize perform path normalization if nonzero
* @return number of valid characters in @dst
* @author Johan Lindh <johan@linkdata.se>
* @legalese BSD licensed (http://opensource.org/licenses/BSD-2-Clause)
*/
ptrdiff_t urldecode(char* dst, const char* src, int normalize)
{
char* org_dst = dst;
int slash_dot_dot = 0;
char ch, a, b;
do {
ch = *src++;
if (ch == '%' && isxdigit(a = src[0]) && isxdigit(b = src[1])) {
if (a < 'A') a -= '0';
else if(a < 'a') a -= 'A' - 10;
else a -= 'a' - 10;
if (b < 'A') b -= '0';
else if(b < 'a') b -= 'A' - 10;
else b -= 'a' - 10;
ch = 16 * a + b;
src += 2;
}
if (normalize) {
switch (ch) {
case '/':
if (slash_dot_dot < 3) {
/* compress consecutive slashes and remove slash-dot */
dst -= slash_dot_dot;
slash_dot_dot = 1;
break;
}
/* fall-through */
case '?':
/* at start of query, stop normalizing */
if (ch == '?')
normalize = 0;
/* fall-through */
case '\0':
if (slash_dot_dot > 1) {
/* remove trailing slash-dot-(dot) */
dst -= slash_dot_dot;
/* remove parent directory if it was two dots */
if (slash_dot_dot == 3)
while (dst > org_dst && *--dst != '/')
/* empty body */;
slash_dot_dot = (ch == '/') ? 1 : 0;
/* keep the root slash if any */
if (!slash_dot_dot && dst == org_dst && *dst == '/')
++dst;
}
break;
case '.':
if (slash_dot_dot == 1 || slash_dot_dot == 2) {
++slash_dot_dot;
break;
}
/* fall-through */
default:
slash_dot_dot = 0;
}
}
*dst++ = ch;
} while(ch);
return (dst - org_dst) - 1;
}

- 19
- 2
-
Thanks. Here it is without the optional path stuff. http://pastebin.com/RN5g7g9u – Julian Jun 03 '14 at 04:09
-
This does not follow any recommandation, and is completely wrong compared to what the author asks for ('+' is not replaced by space for example). Path normalization has nothing to do with url decoding. If you intent to normalize your path, you should first split your URL in parts (scheme, authority, path, query, fragment) and then apply whatever algorithm you like only on the path part. – xryl669 Feb 03 '15 at 09:04
the juicy bits
#include <ctype.h> // isdigit, tolower
from_hex(char ch) {
return isdigit(ch) ? ch - '0' : tolower(ch) - 'a' + 10;
}
char to_hex(char code) {
static char hex[] = "0123456789abcdef";
return hex[code & 15];
}
noting that
char d = from_hex(hex[0]) << 4 | from_hex(hex[1]);
as in
// %7B = '{'
char d = from_hex('7') << 4 | from_hex('B');

- 3,658
- 4
- 32
- 42
You can use "g_uri_escape_string()" function provided glib.h. https://developer.gnome.org/glib/stable/glib-URI-Functions.html
#include <stdio.h>
#include <stdlib.h>
#include <glib.h>
int main() {
char *uri = "http://www.example.com?hello world";
char *encoded_uri = NULL;
//as per wiki (https://en.wikipedia.org/wiki/Percent-encoding)
char *escape_char_str = "!*'();:@&=+$,/?#[]";
encoded_uri = g_uri_escape_string(uri, escape_char_str, TRUE);
printf("[%s]\n", encoded_uri);
free(encoded_uri);
return 0;
}
compile it with:
gcc encoding_URI.c `pkg-config --cflags --libs glib-2.0`

- 99
- 1
- 6
I know the question asks for a C++ method, but for those who might need it, I came up with a very short function in plain C to encode a string. It doesn't create a new string, rather it alters the existing one, meaning that it must have enough size to hold the new string. Very easy to keep up.
void urlEncode(char *string)
{
char charToEncode;
int posToEncode;
while (((posToEncode=strspn(string,"1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_.~"))!=0) &&(posToEncode<strlen(string)))
{
charToEncode=string[posToEncode];
memmove(string+posToEncode+3,string+posToEncode+1,strlen(string+posToEncode));
string[posToEncode]='%';
string[posToEncode+1]="0123456789ABCDEF"[charToEncode>>4];
string[posToEncode+2]="0123456789ABCDEF"[charToEncode&0xf];
string+=posToEncode+3;
}
}

- 139
- 1
- 7
Had to do it in a project without Boost. So, ended up writing my own. I will just put it on GitHub: https://github.com/corporateshark/LUrlParser
clParseURL URL = clParseURL::ParseURL( "https://name:pwd@github.com:80/path/res" );
if ( URL.IsValid() )
{
cout << "Scheme : " << URL.m_Scheme << endl;
cout << "Host : " << URL.m_Host << endl;
cout << "Port : " << URL.m_Port << endl;
cout << "Path : " << URL.m_Path << endl;
cout << "Query : " << URL.m_Query << endl;
cout << "Fragment : " << URL.m_Fragment << endl;
cout << "User name : " << URL.m_UserName << endl;
cout << "Password : " << URL.m_Password << endl;
}

- 24,894
- 13
- 106
- 174
-
1Your link is to a library which parses a URL. It does not %-encode a URL. (Or at least, I couldn't see a % anywhere in the source.) As such, I don't think this answers the question. – Martin Bonner supports Monica Nov 20 '15 at 13:23