0

I have 10 csv files, each 2 GB in size. I know if I combine the files there will be 48 milion rows because I have made the query in sql myself. My computer has 16 GB of RAM. What are my options in working with these files in R? I understand that I can use sparkr or sparlyr, but I am still trying to understand the technology. Are there any other options?

Joe
  • 8,073
  • 1
  • 52
  • 58
xhr489
  • 1,957
  • 13
  • 39
  • There are various methods to work with files stored on disc. See the HPC task view on CRAN for details. – Ralf Stubner Dec 01 '18 at 11:03
  • It partly depends on what you want to do. If your algorithms are easy to parallelize you can just iterate over the 10 files you have now without ever combining theme. – Ista Dec 01 '18 at 20:10
  • Although you can process the data in Spark, try with a simple approach first by using `fread` of `data.table`. Check out this thread: https://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes – Emer Dec 11 '18 at 20:30

0 Answers0