Read and rbind multiple csv files

Question

I have a series of csv files (one per anum) with the same column headers and different number of rows. Originally I was reading them in and merging them like so;

setwd <- ("N:/Ring data by cruise/Shetland")
LengthHeight2013 <- read.csv("N:/Ring data by      cruise/Shetland/R_0113A_S2013_WD.csv",sep=",",header=TRUE)
LengthHeight2012 <- read.csv("N:/Ring data by cruise/Shetland/R_0212A_S2012_WD.csv",sep=",",header=TRUE)
LengthHeight2011 <- read.csv("N:/Ring data by cruise/Shetland/R_0211A_S2011_WOD.csv",sep=",",header=TRUE)
LengthHeight2010 <- read.csv("N:/Ring data by cruise/Shetland/R_0310A_S2010_WOD.csv",sep=",",header=TRUE)
LengthHeight2009 <- read.csv("N:/Ring data by cruise/Shetland/R_0309A_S2009_WOD.csv",sep=",",header=TRUE)

LengthHeight <- merge(LengthHeight2013,LengthHeight2012,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2011,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2010,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2009,all=TRUE)

I would like to know if there is a shorter/tidier way to do this, also considering that each time I run the script I might want to look at a different range of years.

I also found this bit of code by Tony Cookson which looks like it would do what I want, however the data frame it produces for me has only the correct headers but no data rows.

multmerge = function(mypath){
filenames=list.files(path=mypath, full.names=TRUE)
datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
Reduce(function(x,y) {merge(x,y)}, datalist)

mymergeddata = multmerge("C://R//mergeme")

Do you really need merge? What about do.call(rbind, datalist)? — mdsumner, Jun 02 '14 at 13:14

zx8754 · Accepted Answer · 2020-03-26T15:53:50.700

24

Find files (list.files) and read the files in a loop (lapply), then call (do.call) row bind (rbind) to put all files together by rows.

myMergedData <- 
  do.call(rbind,
          lapply(list.files(path = "N:/Ring data by cruise"), read.csv))

Update: There is a vroom package, according to the manuals it is much faster than data.table::fread and base read.csv. The syntax looks neat, too:

library(vroom)
myMergedData <- vroom(files)

edited Mar 26 '20 at 15:53

answered Jun 02 '14 at 13:15

zx8754

52,746
12
114
209

i have a similar issue and use: zoneB = list.files(pattern="*.csv") for (i in 1:length(zoneB)) assign(zoneB[i], read.csv(zoneB[i])) and then ZB = lapply(zoneB, read_csv) %>% bind_rows() ....... but i am wonder is there a a way to exclude specific .csv files in the list? – Lmm May 28 '19 at 22:50
What if I need to pass parameters to the function on lapply? E.g. instead of read.csv I would like to use read.table and set some options on it. – Rafael Santos Jul 29 '19 at 16:50
Hey zx8754 - Thank you for this! Is there a way to keep an id on each document that the folder has? – MelaniaCB Nov 05 '20 at 21:13
1

@MelaniaCB Certainly possible, please ask a new questions with reference to this post, and provide example input and expected outputs. – zx8754 Nov 06 '20 at 08:06

score 15 · Answer 2 · answered Jun 02 '14 at 13:20

15

If you're looking for speed, then try this:

require(data.table) ## 1.9.2 or 1.9.3
ans = rbindlist(lapply(filenames, fread))

answered Jun 02 '14 at 13:20

Arun

116,683
26
284
387

score 2 · Answer 3 · answered Mar 28 '22 at 09:32

2

You could try it in a tidy way like this:

   files <- list.files('the path of you files', 
                        pattern = ".csv$", recursive = TRUE, full.names = TRUE)
    
   myMergedData <- read_csv(files) %>% bind_rows()

answered Mar 28 '22 at 09:32

Michael

21
2

score 1 · Answer 4 · answered Jan 30 '20 at 21:48

1

Don't have enough rep to comment, but to answer Rafael Santos, you can use the code here to add params to the lapply in the answer above. Using lapply and read.csv on multiple files (in R)

answered Jan 30 '20 at 21:48

CTDataGuy

11
2

Read and rbind multiple csv files

4 Answers4

Linked

Related