1

I need to decode with R strings that are URL encoded using RFC 1738.

These strings are coming from PHP requests and encoded with the PHP function http_build_query which uses RFC1738 by default.

For example, Hello, World! becomes Hello%2C+world%21.

You can easily try it online with the following PHP code:

$data = array(
    'text' => 'Hello, world!'
);

echo http_build_query($data, '', '&');

Unfortunately, R base URL encoding uses RFC 3986, which is different!

utils::URLdecode("Hello%2C+world%21")
# "Hello,+world!"

# I was hoping httpuv would have an option for RFC 1738 but it doesn't!
httpuv::decodeURIComponent("Hello%2C+world%21")
# "Hello,+world!"

I want to get "Hello, world!" back! How can I do that with R?

I am happy to just replace + signs with spaces but, if this is your answer, please justify why this is correct. My research so far did not indicate that RFC 1738 is exactly RFC 3986 with plus signs and I don't want weird edge cases...

asachet
  • 6,620
  • 2
  • 30
  • 74
  • This issue might be unrelated to RFC1738 but due to the *application/x-www-form-urlencoded* media type, see also https://stackoverflow.com/questions/2678551/when-to-encode-space-to-plus-or-20. R's `utils::URLencode` uses RFC1738 before R 3.2.1 (check https://cran.r-project.org/doc/manuals/r-devel/NEWS.html), but still encodes spaces as `%20`. Note that `http_build_query` also accepts an `enc_type` argument, which can be set to `PHP_QUERY_RFC3986` and which will encode spaces as `%20` (https://www.php.net/manual/en/function.http-build-query.php). – Joris C. Jul 10 '19 at 07:58
  • Thanks Joris, alas I have no control whatsoever over the PHP server. I cannot change the media type or the `enc_type`. I currently ignore the media type because I only get queries from that PHP server, so I know for sure all strings are _always_ RFC1738 encoded. – asachet Jul 10 '19 at 08:03

0 Answers0