I have a large data set, one of the files is 5GB. Can someone suggest me how to quickly read it into R (RStudio)? Thanks
Asked
Active
Viewed 2,251 times
0
-
Take a look [here](http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r) maybe, though there are some new packages appeared since then. – David Arenburg Jun 22 '15 at 13:59
-
1Depends on what you want to do with and the file contents. If it's a csv file you can use `sqldf` to filter it as if it were a SQLite database and only load what you want. If you *must* load everything, `fread` is still the fastest. – Panagiotis Kanavos Jun 22 '15 at 14:03
2 Answers
3
If you only have 4 GBs of RAM you cannot put 5 GBs of data 'into R'. You can alternatively look at the 'Large memory and out-of-memory data' section of the High Perfomance Computing task view in R. Packages designed for out-of-memory processes such as ff may help you. Otherwise you can use Amazon AWS services to buy computing time on a larger computer.

Steve Bronder
- 926
- 11
- 17
-
Actually, you can if you use Revolution R's distribution. It doesn't require all data to be loaded in memory. This won't make loading 4GB at once any faster though – Panagiotis Kanavos Jun 22 '15 at 14:22
-
No, you cannot. Revolution R uses out-of-memory algorithms like ff does. You still cannot load more data than you have RAM. – Steve Bronder Jun 22 '15 at 14:32
1
My package filematrix
is made for working with matrices while storing them in files in binary format. The function fm.create.from.text.file
reads a matrix from a text file and stores it in a binary file without loading the whole matrix in memory. It can then be accessed by parts using usual subscripting fm[1:4,1:3]
or loaded quickly in memory as a whole fm[]
.

Andrey Shabalin
- 4,389
- 1
- 19
- 18