I'm writing code to:
read a list of folders
sort and extract folders with certain text elements from the list
get full file names from each folder
find the tabs/sheets inside each file
loop/lapply read.xlsx() over my nested list of files
The ultimate goal is to read all tabs/sheets from their respective files from their respective folders, while creating columns to identify what tab/sheet they're from and what folder they are from, and then merging them all in one go.
My preferred package for reading in excel files is 'openxlsx'.
Here's my code for getting folders and files:
path<- "/Users/jackserna/Google Drive/Folder"
dataFolders<- list.files(path)
dataFolders<- sort(dataFolders[starts_with(match = "FY", vars = dataFolders)])
files<- lapply(lapply(dataFolders, FUN = function(x){
paste(path,x,sep = "/")
}), FUN = function(x){
list.files(x, pattern = "*.xlsx", full.names = TRUE)
})
I have been unable to loop/apply a read function for all my files and all my sheets. I would read no more than 1 folder, which would have to be repeated. I used some code from this post...
data.to.merge <- lapply(lapply(files[[1]], FUN = function(x){
read.xlsx(x, sheet = 3, cols = 1:5)
}), na.omit)
merged.daata <- Reduce(function(...) merge(..., all = T), data.to.merge)
However, this approach will not allow me to append sheet names as an extra column for each sheet I read in. This approach assumes that there is data on sheet #3, but to my dismay that is not the case with these files. Data is scattered across sheets, and some sheets must be ignored in order to merge.
In an attempt to grab all the sheets and parse out the sheets I don't want, here's what was done:
allsheets<- list()
for(i in files){
for(j in i){
sheets<- getSheetNames(j)
allsheets<- cbind(allsheets,sheets)
}
}
But this has turned into a nightmare to be able to use for reading and merging.
How can I get R to understand what I'm trying to accomplish?