0

I'm trying to merge many csv files in R. They all share a common column and the merge command works just fine when I manually type in the name of two of the csv files. However, I have too many files to type all the names out every time I need to do this.

This works just fine:

test <- merge(quant_dysmorph_data.csv, srs_adult.csv, by = "individual", all 
= TRUE)

I can make something similar as input for the merge command that looks fine too:

cat(ls(pattern = ".csv"), sep = ",")

Returns: bapq.csv,bapq_raw.csv,bapq_recode.csv,fhi_informant.csv,fhi_interviewer.csv,fhi_subject.csv,quant_dysmorph_data.csv,srs_adult.csv (and so on and so on. Sorry, the comment box won't format it as output correctly...)

However, when I use this as input for the merge command I get an error:

x <- merge(cat(ls(pattern = ".csv"), sep = ","), by = "individual", all = 
TRUE)

Returns:

Error in as.data.frame(y) : argument "y" is missing, with no default
7.as.data.frame(y)
6.as.data.frame(y)
5.nrow(y <- as.data.frame(y))
4.merge.data.frame(as.data.frame(x), as.data.frame(y), ...)
3.merge(as.data.frame(x), as.data.frame(y), ...)
2.merge.default(cat(ls(pattern = ".csv"), sep = ","), by = "individual", all 
= TRUE)
1.merge(cat(ls(pattern = ".csv"), sep = ","), by = "individual", all = TRUE)

Thanks in advance for your help.

Brian
  • 1
  • `cat` is used to print to screen. It returns the NULL object. You probably want to be working with a list of data.frames. See [this post](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames) for more info. – lmo Aug 04 '17 at 16:18
  • I've done something sort of like this before, where I kept ~40 .csv files in their own subdirectory, then used setwd() and dir() to get a vector of file names that I'd loop through. – Christopher Anderson Aug 04 '17 at 17:41

1 Answers1

0

If needing to use ls() in the solution is required then this won't help.

However if you're just looking to merge a folder full of .csv's, every one of which should get merged together by a common variable, then you can do this:

setwd('Drive:/Folder/Subfolder')

fnms<-subset(dir(),substr(dir(),nchar(dir())-3,nchar(dir()))=='.csv')

x<-read.csv(fnms[1])

  for (i in 2:length(fnms)){
temp<-read.csv(fnms[i])
  x <- merge(x,temp, by = "individual", ###or whatever variable you're merging on

all = TRUE)
    }
  • Hi Christopher, Thanks for your suggestion. This works, but results in the duplication of many columns in the final data frame. Do you know a way to keep merge from duplicating columns? I've tried with the join functions in dplyr but the ouput contains NAs instead of my actual data – Brian Aug 07 '17 at 16:35
  • rm(list = ls()) filenames <- list.files(path="./Father", full.names=TRUE) import.list <- llply(filenames, read.csv) data <- full_join(as.data.frame(import.list[1]), as.data.frame(import.list[2])) for (i in 3:length(import.list)){ data <- left_join(data, as.data.frame(import.list[i])) } – Brian Aug 07 '17 at 16:43
  • You could try something like this each iteration prior to the merge: current.varnames=names(x)[which(names(x)!="Individual")]; temp<-subset(read.csv(fnms[i]),select = which((!(names(x) %in% current.varnames))) – Christopher Anderson Aug 07 '17 at 17:46