0

I have a large data set, one of the files is 5GB. Can someone suggest me how to quickly read it into R (RStudio)? Thanks

Community
  • 1
  • 1
Tue Nguyen
  • 27
  • 5
  • Take a look [here](http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r) maybe, though there are some new packages appeared since then. – David Arenburg Jun 22 '15 at 13:59
  • 1
    Depends on what you want to do with and the file contents. If it's a csv file you can use `sqldf` to filter it as if it were a SQLite database and only load what you want. If you *must* load everything, `fread` is still the fastest. – Panagiotis Kanavos Jun 22 '15 at 14:03

2 Answers2

3

If you only have 4 GBs of RAM you cannot put 5 GBs of data 'into R'. You can alternatively look at the 'Large memory and out-of-memory data' section of the High Perfomance Computing task view in R. Packages designed for out-of-memory processes such as ff may help you. Otherwise you can use Amazon AWS services to buy computing time on a larger computer.

Steve Bronder
  • 926
  • 11
  • 17
  • Actually, you can if you use Revolution R's distribution. It doesn't require all data to be loaded in memory. This won't make loading 4GB at once any faster though – Panagiotis Kanavos Jun 22 '15 at 14:22
  • No, you cannot. Revolution R uses out-of-memory algorithms like ff does. You still cannot load more data than you have RAM. – Steve Bronder Jun 22 '15 at 14:32
1

My package filematrix is made for working with matrices while storing them in files in binary format. The function fm.create.from.text.file reads a matrix from a text file and stores it in a binary file without loading the whole matrix in memory. It can then be accessed by parts using usual subscripting fm[1:4,1:3] or loaded quickly in memory as a whole fm[].

Andrey Shabalin
  • 4,389
  • 1
  • 19
  • 18