How to save large output sufficiently fast in text or any other format?

Question

My question is: how to save the output i.e., mydata

 mydata=array(sample(100),dim=c(2,100,4000))

reasonably fast?

I used the reshape2 package as suggested here.

 melt(mydata)

and

 write.table(mydata,file="data_1")

But it is taking more than one hour to save the data into the file. I am looking for any other faster ways to do the job.

`saveRDS` and `readRDS` will probably be the fastest way to save and load R objects. — nrussell, Apr 25 '16 at 11:34
I'm working with genotyping data myself, these rows can contain about a million columns, if your case is anything like that you might consider writing per one or two lines and use `append = TRUE` — Bas, Apr 25 '16 at 11:41
I haven't digested it myself, but you might find the discussion [here](http://www.r-bloggers.com/fast-csv-writing-for-r/) relevant (posted today). — Bryan Hanson, Apr 25 '16 at 11:51
see [this SO question](http://stackoverflow.com/questions/10505605/speeding-up-the-performance-of-write-table), compares feather, fwrite, saveRDS — phiver, Apr 25 '16 at 11:55
@nrussell. It is what I was looking for, very fast in both ways (save and read). Thank you very much. — Janak, Apr 25 '16 at 12:02
@hrbrmstr. I am facing problem when installing the "curl " package in "readr" package. My configuration failed for package ‘curl’. Will try again later with re-installation of r. — Janak, Apr 25 '16 at 12:17
`fwrite` prbly also works (wasn't sure if it was in CRAN `data.table`). You need `libcurl` on your system if not running Windows for the `curl` pkg. — hrbrmstr, Apr 25 '16 at 12:49

score 4 · Accepted Answer · edited May 23 '17 at 11:44

I strongly suggest to refer to this great post, that surely helps in make issues clear about file saving.

Anyway, saveRDS could be the most adequate for you. The difference more relevant, in this case, is that save can save many objects to a file in a single call, whilst saveRDS, being a lower-level function, works with a single object at a time.

save and load allow you to save a named R object to a file or other connection and restore that object again. But, when loaded, the named object is restored to the current environment with the same name it had when saved.

saveRDS and loadRDS, instead, allow to save a single R object to a connection (typically a file) and to restore the object, possibly with a different name. The low level operability makes RDS functions more efficient, probably, for your case.

score 2 · Answer 2 · answered Apr 25 '16 at 11:46

2

Read the help text for saveRDS using ?saveRDS. This will probably be the best way for you to save and load large dataframes.

saveRDS(yourdata, file = "yourdata.Rda")

answered Apr 25 '16 at 11:46

AlternativeHacks

147
1
10

2

Don't forget to set the option `compress = FALSE`, otherwise you miss the speed benifits. – phiver Apr 25 '16 at 11:56
Thanks for the answer even though it is working fast without using compress = FALSE. – Janak Apr 25 '16 at 12:07

How to save large output sufficiently fast in text or any other format?

2 Answers2