0

I am reading in multiple (over 1500) csv files into a single data frame and I keep reading about how fread is very fast to read in csv files. But when I try fread, it is much slower than read.csv. For example, given a list of file paths "file_paths", here is the speed for reading a single csv file:

> system.time(fread(file_paths[[1]]))
   user  system elapsed 
   0.00    0.00    0.64 

> system.time(read.csv(file_paths[[1]]))
   user  system elapsed 
   0.00    0.00    0.03 

It gets much worse when I try to do this over the full list of files. When I tried fread, I ended up stopping the computation after 488 seconds with fread.

> system.time(rbindlist(lapply(unlist(file_paths), fread, header=FALSE)))

Timing stopped at: 1.62 11.83 488.8

> system.time(bind_rows(lapply(unlist(file_paths), read.csv)))
   user  system elapsed 
   1.81    2.37  109.83 

What is the deal with this?

Also, I still am annoyed that it takes this long, so if anyone has a suggestion to speed this up, I'd appreciate it.

CopyOfA
  • 767
  • 5
  • 19
  • See also:: https://stackoverflow.com/questions/24697350/r-fread-data-table-inconsistent-speed – NelsonGon May 07 '20 at 17:32
  • The post about why fread is fast did not answer my question, and the post about inconsistent speed is interesting, but fread has always been slower on this data for me. And even importing a single csv file is slower with fread than with read.csv. I am left wondering if I am using fread correctly or if this is a common experience. – CopyOfA May 07 '20 at 20:05

0 Answers0