0

I have a folder with many csv files. Each csv file has the same number of columns. I would like to combine them creating a column with the file name. I've seen similar post about it but I cannnot get it to work. I'm trying the following script. Any help will be much appreciated

library(plyr)
csvfilenames <- list.files("/Users/PAM/Desktop/CSVFilesToMerge/",
                           pattern="*.csv", all.files=FALSE, full.names=FALSE)
csvfilenames

library(plyr)
CombinedData = ldply(csvfilenames, function(filename) {
dum = read.table(filename, header=TRUE, fill = TRUE, sep=";")
dum$Filename = csvfilenames
return(dum)
})
CombinedData`

This is my next attempt with purrr but it doesnt work

library(readr) # for read_csv()
library(purrr) #for map(), reduce()

data_path<-"/Users/PAM/Desktop/TestCSVFilesToMerge/"

files <- dir(data_path, pattern = "*.csv") # get file names
files


data <- files %>% # read in all the files, appending the path before the filename
map(~ read_csv(file.path(data_path, .))) %>% 
reduce(rbind)
data
Samuel Liew
  • 76,741
  • 107
  • 159
  • 260
Carlos
  • 47
  • 6
  • what about base R with lapply and read.table() ? Then you can use the names of the list to `dplyr::bind_rows()` into a unified `data.frame` – Matias Andina Apr 13 '18 at 01:31
  • `library(purrr); combined_data <- csvfilenames %>% map_dfr(read.csv2, .id = 'filename')` – alistaire Apr 13 '18 at 02:29
  • 1
    https://stackoverflow.com/a/48105838/786542 – Tung Apr 13 '18 at 02:42
  • This is my next attempt with purrr but I get an error `library(readr)` # for read_csv() `library(dbplyr)` # for mutate() `library(tidyr)` # for unnest() `library(purrr)` #for map(), reduce() `data_path<-"/Users/PAM/Desktop/TestCSVFilesToMerge/"` `files <- dir(data_path, pattern = "*.csv")` # get file names `data <- files %>%` # read in all the files, appending the path before the filename `map(~ read_csv(file.path(data_path, .))) %>% reduce(rbind)` `data` – Carlos Apr 13 '18 at 07:11

3 Answers3

0

This should give you exactly what you want,

csvfilenames <- list.files("/Users/PAM/Desktop/CSVFilesToMerge/",
                       pattern="*.csv", all.files=FALSE, full.names=FALSE)

for(i in 1:length(csvfilenames))
{
temp<-read.csv(paste(csvfilenames[i])) #give path if its not your wd
temp$filename_tag<-paste(csvfilenames[i])

list[[i]]<-temp
}

outfile<-as.data.frame(do.call(rbind,list))
Alok VS
  • 176
  • 10
  • Thank you Alok. However, I include the path in the read.csv as you mentioned `temp<-read.csv(file="/Users/PAM/Desktop/CSVFilesToMerge/firstCSVfile.csv", sep=";", header=T, paste(csvfilenames[i]))` and I get the following error "EOF within quoted stringError in $<-.data.frame(*tmp*, "filename_tag", value = "firstCSVfile.csv") : replacement has 1 row, data has 0" – Carlos Apr 13 '18 at 13:36
  • Just change the line to, temp<-read.csv(file=paste("/Users/PAM/Desktop/CSVFilesToMerge/",csvfilenames[i],sep=""), sep=";", header=T) – Alok VS Apr 13 '18 at 16:39
  • Hi Alok, Thanks for your suggestion but I get a Error in list[[i]] <- temp : object of type 'builtin' is not subsettable. Do you have any idea what may be wrong? – Carlos Apr 13 '18 at 21:33
  • Hi Alok, see the code in my next post. The one where I said that "I managed to get it to work. However..." (the one below this one). The problem is that when I put all the "list" together it creates a lot of new rows. I think the problem in in the `outfile<-as.data.frame(do.call(rbind,list))` because the "temp2" files that the loop creates seem to be fine. – Carlos Apr 15 '18 at 10:03
0

I managed to get it to work. However, I was wondering if someone could help me cleaning up the script a bit. This script also transposes the data and therefore I have to manually delete all the extra rows that creates (one per column). I think the problem is in the last line of the script when I transform the list in a data frame.

csvfilenames <- list.files("/Users/carlos/Desktop/TestCSVFilesToMerge/",
                   pattern="*.csv", all.files=FALSE, full.names=FALSE) #creates a list with the file names
csvfilenames

for(i in 1:length(csvfilenames))  

{
a=csvfilenames[i]
temp1<-read.csv(file=paste("/Users/carlos/Desktop/TestCSVFilesToMerge/",a,sep=""),sep=";", header=T)
temp2<-cbind("FileName"=a,temp1[,1:ncol(temp1)]) #add a column called FileName in position 1

list[[i]]<-temp2
}

outfile<-as.data.frame(do.call(rbind,list))
outfile
Carlos
  • 47
  • 6
  • Please use the [edit](https://stackoverflow.com/posts/49807986/edit) button below your question to add this information in your question itself instead of posting it here. Or perhaps edit this so it is actually an answer to your question. – André Kool Apr 16 '18 at 13:55
0

Similar to the above scenario i have 2 csv files which i have read CSV1:

2016 2016 2017 2017 2018 2018 Value % vs. PY Value % vs. PY Value % vs. PY Data1 2018-09 23663 2,757 5.7 %
Data2 2018-08 7687686 15.1 % Data2 2018-09 987987 15.2 %

CSV2: 2016 2016 2017 2017 2018 2018 Value % vs. PY Value % vs. PY Value % vs. PY Data1 2018-09 456 3000 4%

i would like to add filename as column name too. i used following:

x1= read.csv(choose.files(),F,"\t", skipNul = FALSE)
x2= read.csv(choose.files(),F,"\t", skipNul = FALSE)
merged_files= do.call(rbind, list(x1= x1, x2= x2))

but its adding x1in each row. Cna anybody help? I tried to use the code above but its giing me error In file(file, "rt") : cannot open file.

Jana Roy
  • 1
  • 2