I have several large (600000+ rows, ~50 columns) CSV file I import in R through read.csv()
. Each reading takes precious minutes of my time, so I would like to speed up this step as much as possible. One thing I've done is previously identify the columns I don't want and prevent R from reading them. Thanks to an answer on Cross Validated, I've come up with this ugly thing in order to do this:
> data <- read.csv('data.csv', colClasses=c(rep("NULL", 2), rep(NA, 2),
rep("NULL", 17), rep(NA, 1),
rep("NULL", 28)))
Which made the process sensibly faster, but still not fast enough. Is there anything else I can do? I'm working on a good machine (2 GHz Intel Xeon, 24 GB RAM) and am a bit disappointed in having to wait so long to import a data set that is not even really huge.