36

I've got a manual process where I'm uploading 5-6 GB file to a web server via curl:

curl -X POST --data-binary @myfile.csv http://myserver::port/path/to/api

This process works fine, but I'd love to automate it using R. The problem is, I either don't know what I'm doing, or the R libraries for curl don't know how to handle files bigger than ~2GB:

library(RCurl)
postForm(
     "http://myserver::port/path/to/api",
      file = fileUpload(
        filename = path.expand("myfile.csv"),
        contentType = "text/csv"
      ),.encoding="utf-8")

Yeilds Error: Internal Server Error

httr doesn't work either:

library(httr)
POST(
      url = "http://myserver:port/path/to/api",
      body = upload_file(
        path =  path.expand("myfile.csv"),
        type = 'text/csv'),
      verbose()
    )

Which yields:

Response [http://myserver:port/path/to/api]
  Date: 2015-06-30 11:11
  Status: 400
  Content-Type: <unknown>
<EMPTY BODY>

httr is a little more informative with the verbose() option, telling me:

-> POST http://myserver:port/path/to/api
-> User-Agent: libcurl/7.35.0 r-curl/0.9 httr/1.0.0
-> Host: http://myserver::port
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: text/csv
-> Content-Length: -2147483648
-> Expect: 100-continue
-> 
<- HTTP/1.1 400 Bad Request
<- Server: Apache-Coyote/1.1
<- Transfer-Encoding: chunked
<- Date: Tue, 30 Jun 2015 11:11:11 GMT
<- Connection: close
<- 

The Content-Length: -2147483648 looks suspiciously like a 32 bit integer overflow, so I think this is a bug in httr. I suspect RCurl is experiencing a similar failure.

I'd really love a minimal wrapper around curl -X POST --data-binary, but barring that, what are my options for uploading fairly large files from R?

Zach
  • 29,791
  • 35
  • 142
  • 201
  • 5
    I assume you're using the latest version of **httr** which is using the [curl](https://github.com/jeroenooms/curl/tree/master/R) R package. If you can't get it to work using Jeroen's package directly (bypassing httr) it might be faster to create an issue on github. – joran Jun 30 '15 at 20:51
  • @joran Yes I am using httr, which depends on curl. I made a github issue, but in the meantime I'm curious to know if anyone's every upload a 2.2GB+ file to a web-service from R. I can't be the first person in history to try to do this... – Zach Jun 30 '15 at 20:52
  • 3
    In the meantime, you could probably use `system` to invoke curl directly. – tonytonov Jul 01 '15 at 11:09
  • @tonytonov Good idea, I'll try that for now. – Zach Jul 01 '15 at 13:42
  • Not that it helps hugely, but you might like to add what OS and version, and what R version you're using. – smci Jul 19 '15 at 16:28
  • http://www.revolutionanalytics.com/academic-and-public-service-programs Have you tried the RevoScaleR package? – costebk08 Jul 20 '15 at 05:13
  • 1
    @costebk08 I don't think RevoScaleR includes a replacement for curl. – Zach Jul 20 '15 at 17:51
  • @BrandonBertelsen See tonytonov's comment – Zach Jul 24 '15 at 19:32

1 Answers1

13

This bug is fixed in the dev version of httr/curl:

devtools::install_github("jeroenooms/curl")
devtools::install_github("hadley/httr")

This is a bug in the httr and curl packages for R. The bug has been fixed on GitHub as of July 2, 2015, and the change will roll out to CRAN soon.

It is also possible I was calling RCurl incorrectly in the above command, but I could never figure out the correct invocation.

Thomas
  • 43,637
  • 12
  • 109
  • 140
Zach
  • 29,791
  • 35
  • 142
  • 201
  • 2
    _If_ you actually copy-pasted the above command, you mistyped **uft-8**, its **utf-8** – zerweck Jul 24 '15 at 14:26
  • 1
    @zerweck Good catch! I think you can use the `(edit)` button below my post to suggest edits, which I can then review. – Zach Jul 24 '15 at 14:54