2

I have a directory full of .xlsx files. They all have multiple sheets. I want to extract the same sheet from all of the files and append them into a tibble.

I have found numerous solutions for extracting multiple sheets from a single Excel file; however, not a single sheet from multiple files.

I have tried:

    paths = as.tibble(list.files("data/BAH", pattern = ".xlsx", full.names = TRUE, all.files = FALSE))

    test <- paths %>% read_xlsx(sheet = "Portal", col_names = TRUE)

I know the "paths" variable contains all of my file names with path. However, I am not sure how to iterate through each file name appending just the specific sheet = "Portal" to a csv file.

The error is:

Error: path must be a string

I have tried to pass in paths as a vector, as a tibble, and tried sub-scripting it as well. All fails.

So, in summary. I have a directory of xlsx files and I need to extract a single sheet from each one and append it to a csv file. I have tried using purrr with some map functions but also could not get it to work.

My goal was to use the Tidy way.

Thanks for any hints.

DevGin
  • 443
  • 3
  • 12

2 Answers2

4

You have to use lapply() or map(). Try

test <- lapply(paths, read_xlsx, sheet = "Portal", col_names = TRUE)

or

library(purrr)
test <- map_dfr(paths, read_xlsx, sheet = "Portal", col_names = TRUE)

You can then bind the dataframes with

library(dplyr)
test %>% bind_rows()
eastclintw00d
  • 2,250
  • 1
  • 9
  • 18
  • Your answer worked. Do you know a way to add a column with the file name string added? – DevGin Jul 09 '19 at 15:18
  • how do you do this if you are passing a table to map_dfr with a column for the path and a column for the sheet? When I do map_dfr(tibble_df, ~read_excel(path = tibble_df$path, sheet = tibble_df$sheet)) I get Error in tibble_df$path : $ operator is invalid for atomic vectors – jzadra Mar 23 '20 at 22:13
1
library(tidyverse)    
library(readxl)
library(fs)

# Get all files
xlsx_files <- fs::dir_ls("data/BAH", regexp = "\\.xlsx$")

paths = as_tibble(list.files("data/BAH", pattern = ".xlsx", full.names = TRUE, all.files = FALSE))


#portal_tabs <- map_dfr(paths, read_xlsx, sheet = "Portal", col_names = TRUE)
portal_tabs <- map_dfr(xlsx_files, read_xlsx, sheet = "Portal", col_names = TRUE, .id = 'source')
DevGin
  • 443
  • 3
  • 12