2

I have 20 large CSV (100-150MB each) files i would like to load in R and rbind them in a large file and perform my analysis. Reading each CSV file is performed on one core only and takes about 7 min. I am on 64bit 8-core linux with 16gb RAM so resources should not be an issue.

Is there any way to perform this process more efficiently? I am also open to other (open source linux) software (for example binding the CSV files in a different programm and loading in R) or anything that could make this process faster.

Thank you very much

Pop
  • 12,135
  • 5
  • 55
  • 68
ECII
  • 10,297
  • 18
  • 80
  • 121
  • 5
    See this answer: http://stackoverflow.com/a/1820610/602276 – Andrie Sep 06 '12 at 08:00
  • Out of curiosity, what function(s) are you using that you are waiting 7 minutes? – Roman Luštrik Sep 06 '12 at 11:35
  • 1
    Andrie's link helps with the reading .csv part, and http://stackoverflow.com/a/12252047/403310 should help with the `rbind` part. You can use `rbindlist` on `data.frame` as well as `data.table`. – Matt Dowle Sep 06 '12 at 13:00

1 Answers1

1

Maybe you want a function like paste. It's a bash function that merge lines of files.

John Carter
  • 53,924
  • 26
  • 111
  • 144
Alan
  • 3,153
  • 2
  • 15
  • 11