0

I am combining multiple excel files with one to many worksheets. They each have different columns. I am only interested in combining the worksheets with address information. In the case of a worksheet with no address information, I need to make a note of it in the resulting combined file.. In the case that I am having problems, one of the worksheets has vegetables and no addresses, and the other has address information.. I am using the code below to put them together. After I get it to work I will clean standardize them and put them altogether.

dir_path <- "C:/temp/ConsigneeList/stuff4/"         # target directory where the xlsx files are located. 
re_file <- list.files(dir_path, pattern=".xls*")    # regex pattern to match the file name format, in this case 'test1.xlsx', 'test2.xlsx' etc.

read_sheets <- function(dir_path, file){
  xls_file <- paste0(dir_path, file)
  xls_file %>%
    excel_sheets() %>%
    set_names() %>%
    map_df(read_excel, path = xls_file, .id = 'sheet_name') %>% 
    mutate(file_name = file) %>% 
    select(file_name, sheet_name, everything())
}

number_of_excel_files<-length(file.list)
mybiggerlist<-vector('list',number_of_excel_files)
for(file in 1:length(mybiggerlist)) {

  mybiggerlist[[file]]<-read_sheets(dir_path, file.list[file])
  
}

I am getting the error: Error: Can't combine Customer Quick REF$Order No and CH Belt$Order No . I tried to use %>% mutate_all(as.character) as the columns should all be character in nature.. Any ideas on how I could solve this? Alternatively, is there a way to skip importing the offending data and make a row indicating that there was a problem with that worksheet? Thank you!

tom
  • 315
  • 1
  • 3
  • 10
  • Could you please make your example reproducible? Sharing a small sample of 5 rows from each of 2 sheets that cause a problem would be idea. Use `dput()` to share so that it is copy/pasteable and all relevant class and structure information is included. Or if the problem is a single Excel workbook, maybe you could post that somewhere? – Gregor Thomas Nov 16 '21 at 14:49
  • It seems like your error is in reading in a single file within the `read_sheets` function, so the `for` loop is probably irrelevant... Though if you just want to skip the offending data then see `?tryCatch` and the [FAQ on writing tryCatch](https://stackoverflow.com/q/12193779/903061) or read the [Conditions chapter of Advanced R](https://adv-r.hadley.nz/conditions.html) – Gregor Thomas Nov 16 '21 at 14:52
  • Hi Gregor, I have an example excel file but dont know how to attach it or where to send it.. – tom Nov 16 '21 at 15:41
  • Put it up on some file sharing service - Dropbox, pastebin, google drive... Or at least describe in your question the difference between an Excel file that works and an Excel file that doesn't. – Gregor Thomas Nov 16 '21 at 18:22
  • I am only interested in combining the worksheets with address information. In the case of a worksheet with no address information, I need to make a note of it in the resulting combined file.. In the case that I am having problems, one of the worksheets has vegetables and no addresses, and the other has address information.. – tom Nov 16 '21 at 18:55
  • So you need an `if` statement then, in your function. `if(...)` and write `...` to test if there are addresses, and then act accordingly. – Gregor Thomas Nov 16 '21 at 19:08
  • Where would you put the if statement? Thank you! – tom Nov 16 '21 at 19:31

1 Answers1

1

Try something like this:

dir_path <- "C:/temp/ConsigneeList/stuff4/"         # target directory where the xlsx files are located. 
re_file <- list.files(dir_path, pattern=".xls*")    # regex pattern to match the file name format, in this case 'test1.xlsx', 'test2.xlsx' etc.

read_sheets <- function(dir_path, file){
  xls_file <- paste0(dir_path, file)
  sheets <- xls_file %>%
    excel_sheets() %>%
    set_names() %>% ## not really sure if this is doing anything?
    map(read_excel, path = xls_file)
    
    # Now we have all the sheets in a list. 
    # Time to figure out which ones to combine
    # Use purrr::keep to only keep sheets that meet some condition
    # I just put in a wild guess, edit the test so that only sheets
    # you want are kept
    sheets <- purrr::keep(sheets, ~ "Address" %in% names(.))    
    
    bind_rows(sheets, .id = 'sheet_name') %>%
      mutate(file_name = file) %>% 
      select(file_name, sheet_name, everything())
}
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294