0

EDIT : This question is not a duplicate as only reading data is not a problem

I want to do analysis on a csv file in R that is around 10 GB. I am working on a GCE virtual machine that has 60 GB memory.

I would like to know which library of R is suitable for reading and performing operations like filter, groupBy, colMeans etc. with large files

Which of the following should be the best choice (given that memory is not a constraint) -

  1. Stick with read.csv and packages like dplyr or the apply family.
  2. Use packages like ff or bigmemory for parallel processing.
  3. Use RSpark on any other distributed computing framework.
  4. Any other methodology that is perfectly suited for this.
Yashu Seth
  • 935
  • 4
  • 9
  • 24

0 Answers0