0

I receive the same data each week that comes with the same names on the excel tabs. I need to first group all the sheets with the same name and do an operation on them before I combine them all together. For example:

    Week1.xlsx (workbook) 
     - Blue (sheet) - has a column named type that just says "Color"
     - Red (sheet) - has a column named type that just says "Color"
     - Green (sheet)` - has a column named type that just says "Color"


    Week2.xlsx (workbook) 
     - Blue (sheet) - has a column named type that just says "Color"
     - Red (sheet) - has a column named type that just says "Color"
     - Green (sheet) - has a column named type that just says "Color"

etc.

I need to combine all the common sheets together and modify the type column to say which color sheet it came from. If I combined them all together, then the type column will only be populated with "Color", so the name of the sheet is the only piece of information that tells me how to organize it.

Final output: all-data.xlsx

Type
Color - Blue
Color - Red
Color - Blue
Color - Green
.
.
.

I don't necessarily need help with the second part (modifying the columns), I just want to know how to combine all the common sheets together (possibly in a list) that I can them access by name.

  • 2
    I don't think I understand the question. Which part of the code do you need help with? Can you reduce the code to include only that part? How should your final output look like? – Ronak Shah Feb 25 '21 at 08:58
  • Look at the [readxl package](https://readxl.tidyverse.org/), part of the tidyverse. – Limey Feb 25 '21 at 09:44
  • @RonakShah I need help with taking multiple excel sheets and combining all the tabs that have common names. I edited it for clarity. I think once I get that I will be able to get through everything else. – Adrian Fletcher Feb 25 '21 at 18:28

1 Answers1

0

EDIT: I was not sure how to mark this as duplicate, but I just submitted it for review. Please see the post below for more answers.

Ok I found an answer from this post that perfectly sums the issue. Specifically, the portion that says .id = 'sheet_name' accomplishes my goal by assigning each row to the name of the sheet.

library(tidyverse)
library(readxl)

dir_path <- "~/test_dir/"         # target directory path where the xlsx files are located. 
re_file <- "^test[0-9]\\.xlsx"    # regex pattern to match the file name format, in this case 'test1.xlsx', 'test2.xlsx' etc, but could simply be 'xlsx'.

read_sheets <- function(dir_path, file){
  xlsx_file <- paste0(dir_path, file)
  xlsx_file %>%
    excel_sheets() %>%
    set_names() %>%
    map_df(read_excel, path = xlsx_file, .id = 'sheet_name') %>% 
    mutate(file_name = file) %>% 
    select(file_name, sheet_name, everything())
}

df <- list.files(dir_path, re_file) %>% 
  map_df(~ read_sheets(dir_path, .))