I have 350 csv files with between 10,000 and 700,000 rows in each. I would like to read a subset of each file into R. My method so far is:
library(dplyr)
to_load <- Sys.glob("data/*.csv")
data <- data_frame(file = to_load) %>%
rowwise() %>%
do(read.csv(.$file) %>% filter(condition == "a"))
When I try this out with just the first 6 files, the estimated completion time from do()
is 3 minutes, which will average out to about 3 hours in total. My question is whether there is a more efficient approach to this. I'm open to trying just about anything.