1

I currently have a 400 million row by 8 column R data.table that is around 13GB. I have tried to use write.csv to no avail. Afterwards, I read here about someone using SQLite. However, even that didn't work. My main motivation is to convert it into something that can be easily read by Matlab, S-Plus, etc. Converting into .RData form is very efficient as it compresses it into 900 MB. Is there something similar, but that is more applicable to other programs? Thanks!

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
user1398057
  • 1,109
  • 1
  • 11
  • 15
  • maybe `hdf5` ? A few more details about the contents (e.g. is it entirely numeric/double or does it contain other data types?) might be useful. – Ben Bolker Oct 06 '14 at 00:05
  • 1
    What is the problem with `write.csv`? It fails or freezes? You could open a file connection and print rows by chunks of a few thousands. I doubt this will ever fail. You can put a `print` in the loop to get an idea of how long it will take. And when you are done writing the csv, zip it. – flodel Oct 06 '14 at 00:10
  • @flodel so you mean like a loop or opening a looped file connection? – user1398057 Oct 06 '14 at 00:30
  • 1
    What do you mean "to no avail" or "that didn't work"? Did they fail with an error? Take too long? Create too large of an output file? – josliber Oct 06 '14 at 00:35
  • To echo @josilber, what was the specific error when trying `write.csv` (you do know this operation will take some time, right?) and what didn't work from the SO answer that uses `RSQLite`? – hrbrmstr Oct 06 '14 at 02:45
  • 1
    the write.csv command would take forever. I was using an Amazon server with 244 GB of RAM and I stopped the process after 2 hours. I was wondering if there was an easier way. – user1398057 Oct 06 '14 at 03:33
  • 1
    http://stackoverflow.com/questions/12013953/write-csv-for-large-data-table-in-r – shadowtalker Oct 06 '14 at 04:44

0 Answers0