3

I have a folder with hundreds of .csv files that I would like to merge into a single data frame.

I've been using this code which works relatively well but only for files with matching headers.

    df <- do.call(rbind, lapply(filenames , read.csv))

Two Part Question.

1.) I would like to append this do.call code to include an additional column that includes the .csv file name for each row of data

2.) I would also like to be able to merge .csv files with non-matching headers. Each file has four columns (with different names) but the data type matches. So I want to force the files to bind together regardless of column header.

This post has been helpful but not for merging files with non-mathcing headers:

Merge multiple CSV files and remove duplicates in R

I've also used this code; again only for files with matching headers:

    for (file in filenames){
   # if the merged dataset doesn't exist, create it
    if (!exists("dataset")){
     dataset <- read.csv(file, header=TRUE, sep="\t")
    }
   # if the merged dataset does exist, append to it
    if (exists("dataset")){
     temp_dataset <-read.csv(file, header=TRUE, sep="\t")
     dataset<-rbind(dataset, temp_dataset)
     rm(temp_dataset)
    }
   }
Community
  • 1
  • 1
Flammulation
  • 336
  • 2
  • 16
  • What is very unclear here is what you would like to happen with non-matching columns? Do you delete them, do you keep them all? I f the latter, what happens with files that don't have the new columns – Ricardo Saporta Oct 03 '13 at 19:18
  • @RicardoSaporta - I edited the question slightly - Each file has four columns (with different names) but the data type matches. So I want to force the files to bind together regardless of column header. – Flammulation Oct 03 '13 at 19:23
  • are the columns always in the same order, from one file to the next? If so, just read them in with `header=FALSE` and then rbind them – Ricardo Saporta Oct 03 '13 at 20:02
  • @RicardoSaporta - Yes all columns are in the same order in each file. – Flammulation Oct 03 '13 at 20:07
  • @RicardoSaporta - Your suggestion works well, however, when I include the `header=FALSE` bit I get an extra row of data with the header information from the respective .csv file. I end up having a few hundred extra rows of data I do not need. I guess I can always sort and drop those rows. Just looking for a quicker way. Thanks. – Flammulation Oct 03 '13 at 20:14

1 Answers1

0

What about:

df <- do.call(rbind, lapply(filenames , 
                        function(x) data.frame(filname = x, read.csv(x)))
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 1
    This works great for adding the a column with file names. Any idea about incorporating the ability to merge files with non-matching column names? – Flammulation Oct 03 '13 at 19:09
  • 1
    You need to either overwrite the existing names or provide a rule for matching names and deleting non-matches. The question is too vague at the moment. Provide a test case by editing your question. – IRTFM Oct 03 '13 at 19:11
  • 1
    I incorporated the code for adding the file name from @DWin with the `header=FALSE` addition from @RicardoSaporta and ended up with this code: `dataset<- ldply(filenames, function(x) data.frame(filname = x, read.csv(x,header=FALSE)))` – Flammulation Oct 03 '13 at 20:34
  • much simpler: take @Dwin's suggestion, and wrap the `lapply` portion inside of `unname(.)` – Ricardo Saporta Oct 03 '13 at 21:06
  • I think you would need to add skip=1 if you were going to say header=FALSE. And I would wrap `unname` around the `read.csv(x)` rather than around the `lappy` if you want the names of the files preserved. – IRTFM Oct 03 '13 at 21:54
  • To be (possibly) a bit clearer, change `read.csv(x)` to _either_ `read.csv(x, header=FALSE, skip=1)` or to `unname(read.csv(x))`. Either approach will get rid of the names of the columns (either by not ever reading them in the first place or by discarding them after reading them in). – Brian Diggs Oct 03 '13 at 23:05