I currently have a 400 million row by 8 column R data.table that is around 13GB. I have tried to use write.csv
to no avail. Afterwards, I read here about someone using SQLite
. However, even that didn't work. My main motivation is to convert it into something that can be easily read by Matlab, S-Plus, etc. Converting into .RData form is very efficient as it compresses it into 900 MB. Is there something similar, but that is more applicable to other programs? Thanks!
Asked
Active
Viewed 147 times
1

Matt Dowle
- 58,872
- 22
- 166
- 224

user1398057
- 1,109
- 1
- 11
- 15
-
maybe `hdf5` ? A few more details about the contents (e.g. is it entirely numeric/double or does it contain other data types?) might be useful. – Ben Bolker Oct 06 '14 at 00:05
-
1What is the problem with `write.csv`? It fails or freezes? You could open a file connection and print rows by chunks of a few thousands. I doubt this will ever fail. You can put a `print` in the loop to get an idea of how long it will take. And when you are done writing the csv, zip it. – flodel Oct 06 '14 at 00:10
-
@flodel so you mean like a loop or opening a looped file connection? – user1398057 Oct 06 '14 at 00:30
-
1What do you mean "to no avail" or "that didn't work"? Did they fail with an error? Take too long? Create too large of an output file? – josliber Oct 06 '14 at 00:35
-
To echo @josilber, what was the specific error when trying `write.csv` (you do know this operation will take some time, right?) and what didn't work from the SO answer that uses `RSQLite`? – hrbrmstr Oct 06 '14 at 02:45
-
1the write.csv command would take forever. I was using an Amazon server with 244 GB of RAM and I stopped the process after 2 hours. I was wondering if there was an easier way. – user1398057 Oct 06 '14 at 03:33
-
1http://stackoverflow.com/questions/12013953/write-csv-for-large-data-table-in-r – shadowtalker Oct 06 '14 at 04:44