2

I would like to read in multiple xlsx files into R that have multiple sheets, I have a header (colnames) for the first sheet of each file but I don't have any for the rest of the sheets, however, it's the exact same columns.

I found that solution in a post:


dir_path <- "~/test_dir/"         # target directory path where the xlsx files are located. 
re_file <- "^test[0-9]\\.xlsx"    # regex pattern to match the file name format, in this case 'test1.xlsx', 'test2.xlsx' etc, but could simply be 'xlsx'.

read_sheets <- function(dir_path, file){
  xlsx_file <- paste0(dir_path, file)
  xlsx_file %>%
    excel_sheets() %>%
    set_names() %>%
    map_df(read_excel, path = xlsx_file, .id = 'sheet_name') %>% 
    mutate(file_name = file) %>% 
    select(file_name, sheet_name, everything())
}

df <- list.files(dir_path, re_file) %>% 
  map_df(~ read_sheets(dir_path, .))

but I can't figure out why it won't work, I get this error.

Error in set_names(.) : 1 argument passed to 'names<-' which requires 2
M--
  • 25,431
  • 8
  • 61
  • 93
Stella
  • 69
  • 1
  • 10
  • set_names() needs the vector to be named and a vector with the names. In the block of code set_names() is passed the vector to be named, but not the vector of names to be given to it. Try excel_sheets() %>% unlist() %>% set_names() %>% – Pablo Rod Jun 12 '19 at 11:06
  • Where did you find this example? What exactly are you trying to do with set_names? – Sven Jun 12 '19 at 11:10
  • i got in a response to this post [link](https://stackoverflow.com/questions/38197705/read-multiple-xlsx-files-with-multiple-sheets-into-one-r-data-frame), I don't know what set_names is for – Stella Jun 12 '19 at 11:17
  • @PabloRod it gives the same error when adding unlist, i tried without the set_names instruction and it gives me that error: Error in as_mapper(.f, ...) : argument ".f" is missing, with no default – Stella Jun 12 '19 at 11:21
  • Sorry, try removing set_names() %>% – Pablo Rod Jun 12 '19 at 11:22
  • @PabloRod i did and it gave me another error message: ` Error in as_mapper(.f, ...) : argument ".f" is missing, with no default` – Stella Jun 12 '19 at 11:28
  • if you have antother way so that i can read all the sheet of my xlsx files, and bind them all together, i am all ears – Stella Jun 12 '19 at 12:30
  • Do all the sheets have the exact same columns and rows? – Sven Jun 12 '19 at 12:59
  • Yes @Sven, I have a header ( colnames) for the first sheet of each file but i don't have any for the rest of the sheets – Stella Jun 12 '19 at 13:34
  • @Kaina, that might explain your issue and is important information to add in your question. Please add what your files and sheets look like. – Sven Jun 12 '19 at 13:42
  • @Sven it's done – Stella Jun 12 '19 at 13:55
  • Waht about purrr::set_names() instead of set_names() ? – MrSmithGoesToWashington Mar 03 '20 at 08:37
  • 1
    Many thanks for that question! Just solved a big problem that I had! :-) As I wanted all the xlsx files in the same path, I just used: ' re_file <- ".xlsx" ' and it worked just fine. – Érica Wong Apr 20 '21 at 16:59

2 Answers2

0

I've created this readxl solution with 2 excel workbooks, each having 2 sheets with the same columns. In your problem the 2nd sheet (and further) doesn't have the colnames, so they needed to be set with an additional if statement. It's probably not the fastest solution, but it works:

library(readxl)    

#Set path
inputFolder <- "test/"

#Get list of files
fileList <- list.files(path = inputFolder, recursive=T, pattern='*.xlsx')

#Read in each sheet from each excel
for (f in 1:length(fileList)){
  #Find the number of sheets in this workbook
  sheetList <- excel_sheets(paste(inputFolder, fileList[f], sep = ""))
  #Get the sheets of this workbook
  for (s in 1:length(sheetList)) {
    tempSheet <- read_excel(paste(inputFolder, fileList[f], sep = ""), sheet = sheetList[s])
    if (f == 1 & s == 1) {
      df <- tempSheet
    }
    else {
      if(s != 1) {
        names(tempSheet) <- names(df)
      }
      df <- rbind(df,tempSheet)
    }
  }
}
Sven
  • 1,203
  • 1
  • 5
  • 14
  • @Kaina, you're very welcome. Please accept the answer if it helped you. – Sven Jun 12 '19 at 15:55
  • after using the code to read multiple files with multiple sheet, I realized that it dosen't read correctly the files, it puts NA's where it shouln't, can you please help me on this ? or anyone else ? – Stella Jun 19 '19 at 16:04
  • Is there anything specific about the files or sheets where it doesn't read them correctly? Are you saying it doesn't read the files at all, or puts NA's in some cells that shouldn't be NA's? – Sven Jun 19 '19 at 16:09
  • it does read the files, but it has errors in it, there is NA's values in some cells where it souldn't be – Stella Jun 19 '19 at 16:42
  • Then it's probably better to raise a new question for that. – Sven Jun 19 '19 at 16:51
  • ook @Sven, thank you for your help – Stella Jun 19 '19 at 16:56
0

That seems to work. Here is a different means to the same end.

library(XLConnect)

testDir <- "C:\\your_path_here\\"

re_file <- ".+\\.xls.?"
testFiles <- list.files(testDir, re_file, full.names = TRUE)

# This function rbinds in a single dataframe
# the content of multiple sheets in the same workbook
# (assuming that all the sheets have the same column types)
rbindAllSheets <- function(file) {
  wb <- loadWorkbook(file)
  sheets <- getSheets(wb)
  do.call(rbind,
          lapply(sheets, function(sheet) {
            readWorksheet(wb, sheet)
          })
  )
}

# Getting a single dataframe for all the Excel files
result <- do.call(rbind, lapply(testFiles, rbindAllSheets))
ASH
  • 20,759
  • 19
  • 87
  • 200
  • thank you for your response, i tried it and it gave me this error ` Error: POIXMLException (Java): java.lang.reflect.InvocationTargetException` – Stella Jul 14 '19 at 17:42