I am reading in multiple (over 1500) csv files into a single data frame and I keep reading about how fread is very fast to read in csv files. But when I try fread, it is much slower than read.csv. For example, given a list of file paths "file_paths", here is the speed for reading a single csv file:
> system.time(fread(file_paths[[1]]))
user system elapsed
0.00 0.00 0.64
> system.time(read.csv(file_paths[[1]]))
user system elapsed
0.00 0.00 0.03
It gets much worse when I try to do this over the full list of files. When I tried fread, I ended up stopping the computation after 488 seconds with fread.
> system.time(rbindlist(lapply(unlist(file_paths), fread, header=FALSE)))
Timing stopped at: 1.62 11.83 488.8
> system.time(bind_rows(lapply(unlist(file_paths), read.csv)))
user system elapsed
1.81 2.37 109.83
What is the deal with this?
Also, I still am annoyed that it takes this long, so if anyone has a suggestion to speed this up, I'd appreciate it.