0

I have several large (600000+ rows, ~50 columns) CSV file I import in R through read.csv(). Each reading takes precious minutes of my time, so I would like to speed up this step as much as possible. One thing I've done is previously identify the columns I don't want and prevent R from reading them. Thanks to an answer on Cross Validated, I've come up with this ugly thing in order to do this:

 > data <- read.csv('data.csv', colClasses=c(rep("NULL", 2), rep(NA, 2),
                                             rep("NULL", 17), rep(NA, 1),
                                             rep("NULL", 28)))

Which made the process sensibly faster, but still not fast enough. Is there anything else I can do? I'm working on a good machine (2 GHz Intel Xeon, 24 GB RAM) and am a bit disappointed in having to wait so long to import a data set that is not even really huge.

Community
  • 1
  • 1
Waldir Leoncio
  • 10,853
  • 19
  • 77
  • 107

0 Answers0