Options when data does not fit in memory in R

Asked Dec 01 '18 at 10:40

Active Dec 01 '18 at 19:42

Viewed 186 times

I have 10 csv files, each 2 GB in size. I know if I combine the files there will be 48 milion rows because I have made the query in sql myself. My computer has 16 GB of RAM. What are my options in working with these files in R? I understand that I can use sparkr or sparlyr, but I am still trying to understand the technology. Are there any other options?

edited Dec 01 '18 at 19:42

Joe

8,073
1
52
58

asked Dec 01 '18 at 10:40

xhr489

1,957
13
39

There are various methods to work with files stored on disc. See the HPC task view on CRAN for details. – Ralf Stubner Dec 01 '18 at 11:03
It partly depends on what you want to do. If your algorithms are easy to parallelize you can just iterate over the 10 files you have now without ever combining theme. – Ista Dec 01 '18 at 20:10
Although you can process the data in Spark, try with a simple approach first by using `fread` of `data.table`. Check out this thread: https://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes – Emer Dec 11 '18 at 20:30

Options when data does not fit in memory in R

0 Answers0