RCurl postform taking very long time with large datasets

Question

require(RCurl)
RAW.API <- postForm(REDcap.URL, token=Redcap.token, content="record", 
       type="flat", format="csv", rawOrLabel="Label", 
       .opts=curlOptions(ssl.verifypeer=TRUE, cainfo=REDCap.crt, 
       verbose=FALSE))

data <-   read.table(file = textConnection(RAW.API), header = TRUE, 
          sep = ",", na.strings = "", stringsAsFactors = FALSE)

Am using such code to pull data from Redcap into R.The problem is when dealing with large dataset for my case >19,000 records its taking a lot of time even aborting at times. Is there a way to enhance the code above or may be subset the data with date.

Your sample code is insufficient as a [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example since important variables are missing for us to be able to run the same code. It's unclear to me whether you are saying the problem is with the http download or just creating the data.frame. Filtering in R cannot be done prior to reading in the data; if you want to reduce the number of records returned see if your Redcap API has filtering options. There's not a lot we can help you with without a reproducible example. — MrFlick, Sep 16 '14 at 15:36
There's quite a bit of info on interfacing with REDCap in [these slides](https://github.com/sburns/advanced-redcap-interfaces/blob/master/slides.md) Did you try switching to `httr` and using the `verbose()` option? Since it's a layer on top of `RCurl`, it takes all the options you need and may be easier to debug. Also, have you tried the same `postForm` from the command-line `curl`? (examples for that are on that slides link) — hrbrmstr, Sep 16 '14 at 15:45
There is also a [redcapAPI](http://cran.r-project.org/web/packages/redcapAPI/) package, which might solve all of your problems entirely. — Thomas, Sep 17 '14 at 05:51

score 2 · Answer 1 · edited May 23 '17 at 12:29

2

Consider letting one of the existing R packages handle some of the low-level code. Both REDCapR and redcapAPI return the data as a data.frame. They were developed by two independent teams, but we contribute to each other's packages and frequently communicate.

Regarding your specific situation, I suspect that the packages' "batching" will help you. Under the covers, both packages retrieve subsets of the data, and then append them together before returning the unified data.frame. Currently batching doesn't make the overall operation faster, but it substantially helps avoid timeouts

For general advice, REDCapR has some vignettes on CRAN and redcapAPI has a wiki.

Is there anything else that should be added, @Benjamin?

edited May 23 '17 at 12:29

Community

1
1

answered Sep 26 '14 at 19:59

wibeasley

5,000
3
34
62

1

I think @wibeasley has probably diagnosed your issue correctly. This is an issue I've seen discussed for any language accessing the API. REDCapR probably handles this a little better than redcapAPI, but I've used redcapAPI to import a dataset as large as 37,000 subjects. – Benjamin Sep 26 '14 at 21:01
1

You could also try exporting a report. The report will do a query that might limit the data to a small enough set that you can complete the export, but you'll still be subject to the same risk of timeout. You'll need to be using REDCap version 6.0 or higher for this to be an option. – Benjamin Sep 26 '14 at 21:04

RCurl postform taking very long time with large datasets

1 Answers1