3

My question is: how to save the output i.e., mydata

 mydata=array(sample(100),dim=c(2,100,4000))

reasonably fast?

I used the reshape2 package as suggested here.

 melt(mydata)

and

 write.table(mydata,file="data_1")

But it is taking more than one hour to save the data into the file. I am looking for any other faster ways to do the job.

Community
  • 1
  • 1
Janak
  • 653
  • 7
  • 25
  • 1
    `saveRDS` and `readRDS` will probably be the fastest way to save and load R objects. – nrussell Apr 25 '16 at 11:34
  • 1
    `readr::write_delim(mydata, path="data_1")` – hrbrmstr Apr 25 '16 at 11:40
  • I'm working with genotyping data myself, these rows can contain about a million columns, if your case is anything like that you might consider writing per one or two lines and use `append = TRUE` – Bas Apr 25 '16 at 11:41
  • 1
    I haven't digested it myself, but you might find the discussion [here](http://www.r-bloggers.com/fast-csv-writing-for-r/) relevant (posted today). – Bryan Hanson Apr 25 '16 at 11:51
  • 2
    see [this SO question](http://stackoverflow.com/questions/10505605/speeding-up-the-performance-of-write-table), compares feather, fwrite, saveRDS – phiver Apr 25 '16 at 11:55
  • 1
    @nrussell. It is what I was looking for, very fast in both ways (save and read). Thank you very much. – Janak Apr 25 '16 at 12:02
  • @hrbrmstr. I am facing problem when installing the "curl " package in "readr" package. My configuration failed for package ‘curl’. Will try again later with re-installation of r. – Janak Apr 25 '16 at 12:17
  • 1
    `fwrite` prbly also works (wasn't sure if it was in CRAN `data.table`). You need `libcurl` on your system if not running Windows for the `curl` pkg. – hrbrmstr Apr 25 '16 at 12:49
  • @hrbrmstr. Thanks. It is working. – Janak Apr 27 '16 at 05:06

2 Answers2

4

I strongly suggest to refer to this great post, that surely helps in make issues clear about file saving.

Anyway, saveRDS could be the most adequate for you. The difference more relevant, in this case, is that save can save many objects to a file in a single call, whilst saveRDS, being a lower-level function, works with a single object at a time.

save and load allow you to save a named R object to a file or other connection and restore that object again. But, when loaded, the named object is restored to the current environment with the same name it had when saved.

saveRDS and loadRDS, instead, allow to save a single R object to a connection (typically a file) and to restore the object, possibly with a different name. The low level operability makes RDS functions more efficient, probably, for your case.

Community
  • 1
  • 1
Worice
  • 3,847
  • 3
  • 28
  • 49
2

Read the help text for saveRDS using ?saveRDS. This will probably be the best way for you to save and load large dataframes.

saveRDS(yourdata, file = "yourdata.Rda")
AlternativeHacks
  • 147
  • 1
  • 10