8

I am trying to import a large xlsx file into R that has many sheets of data. I was attempting to do this through XLConnect, but java memory problems (such as those described in this thread have prevented this technique from being successful.)

Instead, I am trying to use the openxlsx package, which I have read works much faster and avoids Java altogether. But is there a way to use its read.xlsx function within a loop to read in all of the sheets into separate dataframes? The technique I was using with the other package is no longer valid bc commands like loadWorkbook() and getSheets() can no longer be used.

Thank you for your help.

Community
  • 1
  • 1
Z_D
  • 797
  • 2
  • 12
  • 30

3 Answers3

8

I think the getSheetNames() function is the right function to use. It will give you a vector of the worksheet names in a file. Then you can loop over this list to read in a list of data.frames.

read_all_sheets = function(xlsxFile, ...) {
  sheet_names = openxlsx::getSheetNames(xlsxFile)
  sheet_list = as.list(rep(NA, length(sheet_names)))
  names(sheet_list) = sheet_names
  for (sn in sheet_names) {
    sheet_list[[sn]] = openxlsx::read.xlsx(xlsxFile, sheet=sn, ...)
  }
  return(sheet_list)
}

read_all_sheets(myxlsxFile)
ichbinallen
  • 1,019
  • 12
  • 18
7

Doing nothing more than perusing the documentation for openxlsx quickly leads one to the function sheets(), which it states is deprecated in place of names() which returns the names of all the worksheets in a workbook. You can then iterate over them in a simple for loop.

I'm not sure why you say that loadWorkbook cannot be used. Again, the documentation clearly shows a function in openxlsx by that name that does roughly the same thing as in XLConnect, although it's arguments are slightly different.

You can also look into the readxl package, which also does not have a Java dependency.

joran
  • 169,992
  • 32
  • 429
  • 468
  • Okay thank you. Apologies, sloppy of me to not notice the similar command in openxlsx. I was referring strictly to the XLConnect functions. – Z_D Apr 06 '15 at 23:37
1

'sapply' also can be used.

read_all_sheets = function(xlsxFile, ...) {
  sheet_names = openxlsx::getSheetNames(xlsxFile)
  sheet_list = sapply(sheet_names, function(sn){openxlsx::read.xlsx(xlsxFile, sheet=sn, ...)}, USE.NAMES = TRUE)
 return(sheet_list)
}
Raja
  • 157
  • 1
  • 11