How to optimize read.csv

Asked Aug 06 '13 at 20:13

Active Jan 11 '15 at 19:32

Viewed 45 times

I have several large (600000+ rows, ~50 columns) CSV file I import in R through read.csv(). Each reading takes precious minutes of my time, so I would like to speed up this step as much as possible. One thing I've done is previously identify the columns I don't want and prevent R from reading them. Thanks to an answer on Cross Validated, I've come up with this ugly thing in order to do this:

 > data <- read.csv('data.csv', colClasses=c(rep("NULL", 2), rep(NA, 2),
                                             rep("NULL", 17), rep(NA, 1),
                                             rep("NULL", 28)))

Which made the process sensibly faster, but still not fast enough. Is there anything else I can do? I'm working on a good machine (2 GHz Intel Xeon, 24 GB RAM) and am a bit disappointed in having to wait so long to import a data set that is not even really huge.

edited Apr 13 '17 at 12:44

Community

asked Aug 06 '13 at 20:13

Waldir Leoncio

10,853
19
77
107

2

use `fread` from the `data.table` package (get the latest one from R-forge, CRAN version is older) – eddi Aug 06 '13 at 20:14
1

Also adding `comment.char=""` option helps. Also this [link](http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r) – Xachriel Aug 06 '13 at 20:16
2

Read `?read.csv` and note the section on "Memory usage". – Joshua Ulrich Aug 06 '13 at 20:17
Thanks, guys. I think I'll vote "close" on it too, just noticed the dupe. :s – Waldir Leoncio Aug 06 '13 at 20:18

How to optimize read.csv

0 Answers0