10

I have a series of csv files (one per anum) with the same column headers and different number of rows. Originally I was reading them in and merging them like so;

setwd <- ("N:/Ring data by cruise/Shetland")
LengthHeight2013 <- read.csv("N:/Ring data by      cruise/Shetland/R_0113A_S2013_WD.csv",sep=",",header=TRUE)
LengthHeight2012 <- read.csv("N:/Ring data by cruise/Shetland/R_0212A_S2012_WD.csv",sep=",",header=TRUE)
LengthHeight2011 <- read.csv("N:/Ring data by cruise/Shetland/R_0211A_S2011_WOD.csv",sep=",",header=TRUE)
LengthHeight2010 <- read.csv("N:/Ring data by cruise/Shetland/R_0310A_S2010_WOD.csv",sep=",",header=TRUE)
LengthHeight2009 <- read.csv("N:/Ring data by cruise/Shetland/R_0309A_S2009_WOD.csv",sep=",",header=TRUE)

LengthHeight <- merge(LengthHeight2013,LengthHeight2012,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2011,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2010,all=TRUE)
LengthHeight <- merge(LengthHeight,LengthHeight2009,all=TRUE)

I would like to know if there is a shorter/tidier way to do this, also considering that each time I run the script I might want to look at a different range of years.

I also found this bit of code by Tony Cookson which looks like it would do what I want, however the data frame it produces for me has only the correct headers but no data rows.

multmerge = function(mypath){
filenames=list.files(path=mypath, full.names=TRUE)
datalist = lapply(filenames, function(x){read.csv(file=x,header=T)})
Reduce(function(x,y) {merge(x,y)}, datalist)

mymergeddata = multmerge("C://R//mergeme")
zx8754
  • 52,746
  • 12
  • 114
  • 209
helen.h
  • 943
  • 2
  • 7
  • 18

4 Answers4

24

Find files (list.files) and read the files in a loop (lapply), then call (do.call) row bind (rbind) to put all files together by rows.

myMergedData <- 
  do.call(rbind,
          lapply(list.files(path = "N:/Ring data by cruise"), read.csv))

Update: There is a vroom package, according to the manuals it is much faster than data.table::fread and base read.csv. The syntax looks neat, too:

library(vroom)
myMergedData <- vroom(files)
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • i have a similar issue and use: zoneB = list.files(pattern="*.csv") for (i in 1:length(zoneB)) assign(zoneB[i], read.csv(zoneB[i])) and then ZB = lapply(zoneB, read_csv) %>% bind_rows() ....... but i am wonder is there a a way to exclude specific .csv files in the list? – Lmm May 28 '19 at 22:50
  • What if I need to pass parameters to the function on lapply? E.g. instead of read.csv I would like to use read.table and set some options on it. – Rafael Santos Jul 29 '19 at 16:50
  • Hey zx8754 - Thank you for this! Is there a way to keep an id on each document that the folder has? – MelaniaCB Nov 05 '20 at 21:13
  • 1
    @MelaniaCB Certainly possible, please ask a new questions with reference to this post, and provide example input and expected outputs. – zx8754 Nov 06 '20 at 08:06
15

If you're looking for speed, then try this:

require(data.table) ## 1.9.2 or 1.9.3
ans = rbindlist(lapply(filenames, fread))
Arun
  • 116,683
  • 26
  • 284
  • 387
2

You could try it in a tidy way like this:

   files <- list.files('the path of you files', 
                        pattern = ".csv$", recursive = TRUE, full.names = TRUE)
    
   myMergedData <- read_csv(files) %>% bind_rows()
Michael
  • 21
  • 2
1

Don't have enough rep to comment, but to answer Rafael Santos, you can use the code here to add params to the lapply in the answer above. Using lapply and read.csv on multiple files (in R)

CTDataGuy
  • 11
  • 2