0

I have multiple Excel files, each with two columns. One being the same and the other being different for different files.

How can I merge all the files by their common column and save them as a new file?

Note:

I don't wish to combine them one under the other (using rbind). Instead, I want to merge them based on the commun column.

For R:

I have filenames in the following format.

percent- 15  knt 03_01__00_51.csv
percent- 20  knt 03_01__00_54.csv
percent- 25  knt 03_01__00_57.csv

Due to the file names format I can't write a script to read each file individually. I don't know how to write a loop that iterates over just 15 ,20 .... and leaves the end part somehow.

talat
  • 68,970
  • 21
  • 126
  • 157
Abhishek Bhatia
  • 9,404
  • 26
  • 87
  • 142

1 Answers1

2

You can probably do something like:

# if those are the only files in the folder, you don't even need a pattern
filelist <- list.files(pattern = "^percent.*\\.csv$")    # read all file names according to pattern
files <- lapply(filelist, read.csv, header=TRUE)         # read all files in filelist
files <- lapply(files, function(x) x[-1])                # remove first column of each file
DF = Reduce(function(...) merge(..., by = "CommonColumn", all=T), files)   # merge all files
x <- sub("^(percent- )(\\d+)(\\s.*)$", "\\2", filelist)  # get the file name numbers 
names(DF[-1]) <- paste(names(DF[-1]), x, sep = "-")      # add file name numbers to column names in DF
write.csv(DF, "myfile.csv")                              # write data to new file

The Reduce() part is taken from here.

Community
  • 1
  • 1
talat
  • 68,970
  • 21
  • 126
  • 157
  • Works great. There is one problem though. Each file has three columns- Serial No. , Common Column , Different Colunm. Using the above code the Serial No. also gets copied from each file. – Abhishek Bhatia Jan 02 '15 at 20:57
  • Is there anyway. If I can select two common columns or select which different column to taken from each file. Or any other way you suggest to resolve the issue. – Abhishek Bhatia Jan 02 '15 at 20:58
  • Thanks! Yeah I wish to include serial no column and it is the first column. Can you please can make a edit in the original answer. It would better to understand that way. – Abhishek Bhatia Jan 02 '15 at 21:14
  • Are you saying that you want to use serial no column as a second common column? Your descriptions/comments are not very clear – talat Jan 02 '15 at 21:17
  • Sorry. I don't want use serial no as a column. So the output would- ..... . Here is the second column and is the third one in each file and the first. – Abhishek Bhatia Jan 02 '15 at 21:19
  • Thanks! Great. Could it possible to name the different columns by filenames as well? For example, consider the different column name is avgTime, so the output file should contain ... – Abhishek Bhatia Jan 02 '15 at 21:33