How to merge multiple xlsx files into a single xlsx file by the first part of each file's name using R?

Question

I have several .xlsx files in a directory:

Russia - GDP.xlsx
Russia - GNP.xlsx
USA - GDP.xlsx
USA - GNP.xlsx

I would like to combine the files according to the first part of the files name into a new xlsx file. So the output would look like this:

Russia.xlsx
USA.xlsx

With each .xlsx file containing two tabs: GDP and GNP.

Is there a way to do this using R? Thanks for your help.

@JeanVuda I've looked around for options, but unfortunately it's unclear to me that you can do this in R. One thing that came to mind was to read the files into R, then export them using the `write.xlsx` function. Wasn't sure if one can batch combine files by name in a directory since that would be (seemingly) quicker. — Natasha Rostov, Mar 29 '18 at 15:10
Yeah, That would be the approach and you can do all that in R. — JeanVuda, Mar 29 '18 at 15:13
@JeanVuda Thanks. Any suggestions for combining by name in R? Is there a way to loop it so it's writing xlsx files according to the first part of the file? (Russia, USA, etc.) — Natasha Rostov, Mar 29 '18 at 15:16

Deepak Rajendran · Answer 1 · 2018-03-29T16:33:57.120

Assuming that the file name is consistent to <country> - GDP.xlsx and <country> - GNP.xlsx:

library(xlsx)

# Change following as needed
path <- "C:/OneXLSX/"

# Fetch only the files from the folder
files <- list.files(path, pattern = "*.xlsx")

# Get the countries name only
countries <- unique(unlist(strsplit(files, " -(.*)")))

# Loop through each country, read GDP/GNP and write them to one single file
for (each in countries){
  GDP <- read.xlsx(paste(path, each, " - GDP.xlsx", sep = ""), 1, header = TRUE)
  GNP <- read.xlsx(paste(path, each, " - GNP.xlsx", sep = ""), 1, header = TRUE)

  write.xlsx(GDP, paste(path, each, ".xlsx", sep = ""), 
             sheetName = "GDP", row.names = FALSE)
  write.xlsx(GNP, paste(path, each, ".xlsx", sep = ""), 
             sheetName = "GNP", append = TRUE, row.names = FALSE)
}

My package of choice to work with Excel files is xlsx. Please feel free to change it to your preferred package and update the syntax as needed.

I apologize if the code is not pretty.

JeanVuda · Answer 2 · 2018-03-29T16:42:52.107

You can try this:

# Reading all files in a folder
library(openxlsx)
path = "C:\\Users\\Folder"
files<-list.files(path, pattern='*.xlsx$',
                  full.names=TRUE)

# Assuming you have these file names
files<-c("Russia - GDP.xlsx",
         "Russia - GNP.xlsx",
         "USA - GDP.xlsx",
         "USA - GNP.xlsx")

names(files)<-files

# Getting the unique countries
rexp <- "(.*)\\s+\\-.*"
uniqueCountries<- unique(sub(rexp, "\\1",names(files)))

toread<-lapply(uniqueCountries, function(x) grep(x,files))
names(toread)<-uniqueCountries

lapply(toread, function(x) {#print(x)
  xy<-files[x]
  dat<-lapply(xy, read.xlsx)
  fileN=sprintf("%s.xlsx",x)
  write.xlsx(dat, fileN)
})

How to merge multiple xlsx files into a single xlsx file by the first part of each file's name using R?

2 Answers2