0

I want to read a bunch of excel files all located in the same directory and store them in different sheets in a consolidated Excel file.

I initially tried using XLConnect but kept getting the error GC overhead limit exceeded. I stumbled upon this question which says that it is a common problem with Java based Excel handling packages such as XLConnect and xlsx. I tried the memory management trick suggested there, but it did not work. One of the comments in one of the comments on the accepted answers suggested using openxls as it based on RCpp and hence avoid this particular problem.

My current code is as follows:

library(openxlsx)
mnth="January"
files <- list.files(path="./Original Files", pattern=mnth, full.names=T, recursive=FALSE)  #pattern match as multiple files are from the same month
# Read them into a list and write to sheet
wb <- createWorkbook()
lapply(files, function(x){
  print(x)
  xlFile<-read.xlsx(xlsxFile = x, sheet = 1, startRow = 2, colNames = T)  #Also tried
  str(xlFile)
  #Create a sheet in the new Excel file called Consolidated.xlsx with the month name
  #Append current data in sheet
})

The problem I am getting is the error: Error in read.xlsx.default(xlsxFile = x, sheet = 1, startRow = 2, colNames = T) : openxlsx can not read .xls or .xlm files!

I have ensured that files variable contains all the files of interest (Ex: January 2015.xls, January 2016.xls, etc). I have also ensured that the path to the file is correct and the Excel files actually exists there.

I have left the writing to Excel as skeleton code as I need to solve the problem with reading the files first.

In case it helps, here is the code attempt with XLConnect

library(XLConnect)

setwd("D:/something/something")
mnth="January"
files <- list.files(path="./Original Files", pattern=mnth, full.names=T, recursive=FALSE)
# Read them into a list
df.list = lapply(files, readWorksheetFromFile, sheet=1, startRow=2)
#combine them into a single data frame and write to disk:
df = do.call(rbind, df.list)
rm(df.list)
outputFileName<-"Consolidated.xlsx"
# Load workbook (create if not existing)
wb <- loadWorkbook(outputFileName, create = TRUE)
createSheet(wb, name = mnth)
writeWorksheet(wb,df,sheet = mnth)
#write.xlsx2(df, outputFileName, sheetName = mnth, col.names = T, row.names = F, append = TRUE)
saveWorkbook(wb)

rm(df)
gc()
Community
  • 1
  • 1
DotPi
  • 3,977
  • 6
  • 33
  • 53
  • I'm very confused. You are lising files from `files <- list.files(path="./Original Files", ...)` but then are trying to open those file names with `xlsxFile <- system.file(x, package = "openxlsx")`. The `system.file` function opens files included with package distributions which does not seem to be what you are trying to do. I think you are just using a bad path. – MrFlick Dec 12 '16 at 19:07
  • I also tried using `xlFile<-read.xlsx(xlsxFile = x, sheet = 1, startRow = 2, colNames = T) `. I will update the question with the error I got with that. – DotPi Dec 12 '16 at 19:08
  • 2
    The new error message seems pretty clear. The `openxlsx` package cannot read old-style `xls` files; it only reads `xlsx` files. – MrFlick Dec 12 '16 at 19:18
  • Thanks! Can you add that as an answer? I just assumed that since the `xlsx` package read both, so can `openxlsx`. – DotPi Dec 12 '16 at 19:32

0 Answers0