0

I am constructing an URI in R which is generated on the fly with ~40000 characters.

I tried using

  • RCurl

  • jsonlite

  • curl

All three give a bad URL Error when connecting through a HTTP GET request. I am refraining from using httr as it will install 5 additional dependencies, while I want minimum dependency in my R program. I am unsure if even httr would be able to handle so many characters in URL.

Is there a way that I can encode/pack it to a allowed limit or a better approach/package that can handle URL of any length similar to python's urllib?

Thanks in advance.

user6591903
  • 199
  • 1
  • 5
  • From these answers about 2000 characters seems like a maximum. See these related questions: http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers, http://stackoverflow.com/questions/2659952/maximum-length-of-http-get-request. – Dave2e Jul 15 '16 at 01:20
  • looking at `jsonlite::fromJSON` you see it checks whether `txt` (which in your case is a URL) is `< 1000 "bytes"` ([`jsonlite::fromJSON` source code](https://github.com/jeroenooms/jsonlite/blob/master/R/fromJSON.R) ) – SymbolixAU Jul 15 '16 at 01:53
  • So, is there no way that I can access long URLs through R other than converting this `GET` request to a `POST` request, which I think defeats the whole purpose of the `HTTP` Protocol? – user6591903 Jul 15 '16 at 05:03
  • Where is the "bad URL" error really coming from? The R-client part that is making the request, or from the server itself? Servers are free to limit their URL length and should return error 414 if asked to serve something too long. Can you give some code that generates the error? – Spacedman Jul 15 '16 at 11:00

1 Answers1

4

This is not a limitation of RCurl.

Let's make a long URL and try it:

> s = paste0(rep(letters,2000),collapse="")
> nchar(s)
[1] 52000

That's 52000 characters of A-Z. Stick it on a URL:

> url = paste0("http://www.omegahat.net/RCurl/",s,sep="")
> nchar(url)
[1] 52030
> substr(url, 1, 40)
[1] "http://www.omegahat.net/RCurl/abcdefghij"

Now try and get it:

> txt = getURL(url)
> txt
[1] "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>414 Request-URI Too Large</title>\n</head><body>\n<h1>Request-URI Too Large</h1>\n<p>The requested URL's length exceeds the capacity\nlimit for this server.<br />\n</p>\n</body></html>\n"
> 

That's the correct response from the server. The server decided it was a long URL, returned a 414 error, and proves RCurl can request URLs of over 40,000 characters.

Until we know more, I can only presume the "bad URL" message is coming from the server, about which we know nothing.

Spacedman
  • 92,590
  • 12
  • 140
  • 224
  • The error was due to a new line being generated everytime. Also I used `rjson::fromJSON` to parse RCurl `JSON` response back. This was much faster for larger data as compared to `jsonlite::fromJSON`. Still I am in two minds to use a binary format for data exchange instead of JSON. Thank you so much. Appreciate the help. – user6591903 Jul 15 '16 at 23:59