I've checked several related questions such is this
How to load data quickly into R?
I'm quoting specific part of the most rated answer
It depends on what you want to do and how you process the data further. In any case, loading from a binary R object is always going to be faster, provided you always need the same dataset. The limiting speed here is the speed of your harddrive, not R. The binary form is the internal representation of the dataframe in the workspace, so there is no transformation needed anymore
I really thought that. However, life is about experimenting. I have a 1.22 GB file containing an igraph object. That's said, i don't think what I found here is related to the object class, mainly because you can load('file.RData') even before you call "library".
Disks in this server are pretty cool. As you can check in the reading time to memory
user@machine data$ pv mygraph.RData > /dev/null
1.22GB 0:00:03 [ 384MB/s] [==================================>] 100% `
However when I load this data from R
>system.time(load('mygraph.RData'))
user system elapsed
178.533 16.490 202.662
So it seems loading *.RData files is 60 times slower than disk limits, which should mean R actually does something while "load".
I've got the same feeling using differentes R versions with different hardware, it's just this time I got patience to make benchmarking (mainly because with such a cool disk storage, it was terrible how long the load actually takes)
Any ideas on how to overcome this?
After ideas in answers
save(g,file="test.RData",compress=F)
Now the file is 3.1GB against 1.22GB before. In my case, loading uncompress is a bit faster (disk is not my bottleneck by far)
> system.time(load('test.RData'))
user system elapsed
126.254 2.701 128.974
Reading the uncompressed file to memory takes like 12 seconds, so I confirm most the time is spent in setting the enviroment
I'll be back with RDS results, sounds like interesting
Here we are, as prommised
system.time(saveRDS(g,file="test2.RData",compress=F))
user system elapsed
7.714 2.820 18.112
And I get a 3.1GB just like "save" uncompressed, although md5sum is different, probably because save
also stores the object name
Now reading...
> system.time(a<-readRDS('test2.RData'))
user system elapsed
41.902 2.166 44.077
So combining both ideas (uncompress and RDS) runs 5 times faster. Thanks for your contributions!