I have a data.frame created from other functions which take a list of .xlsx files and reads all workbooks and included sheets.
The result is a nice data frame as such:
df<-data.frame(
file.name <-c(“C:/R/Folder1/WB1.xlsx”,
“C:/R/Folder1/WB2.xlsx,”,”C:/R/Folder1/WB2.xlsx”, “C:/R/Folder2/WB1.xlsx”, “C:/R/Folder2/WB1.xls”),
data<-list(df1,df2,df3,df4,df5))
While I am able to retrieve (unnest) the dataframes, I lose the corresponding file location, which I need.
Is there a way to keep the corresponding row from the dataframe and paste() as a column when unnesting?
*Sorry for typos. Posted on SO app.
Updating (now that I am in front of a PC)
Reproducible Example:
data:
df1<-data.frame(V1=c(sample(900:970,6)),
V2=c(sample(LETTERS[1:6],6)))
df2<-data.frame(V1=sample(750:780,6),
V2=sample(LETTERS[8:16],6))
df3<-data.frame(V1=sample(200:250,6),
V2=sample(LETTERS[10:20],6),
V3=sample(2300:5821,6))
df4<-data.frame(V1=sample(396:480,6),
V2=sample(LETTERS,6))
df5<-data.frame(V1=sample(50:100,6),
V2=sample(LETTERS,6))
df6<-data.frame(V1=sample(200:250,6),
V2=sample(LETTERS,6),
V3=sample(letters,6))
my.list <- list(df1,df2,df3,df4,df5,df6)
mydf<-data.frame(
files=c("C:/Folder1/Data/File1.xlsx","C:/Folder1/Data/File2.xlsx",
"C:/Folder1/Data/File3.xlsx","C:/Folder2/Data/File1.xlsx",
"C:/Folder2/Data/File2.xlsx","C:/Folder2/Data/File3.xlsx"))
mydf$data<-my.list
When attempting to unnest - I run in to the following issue (due to differences in observations and variables within the list of data.frames (column 2):
y<-unnest(mydf, data)
Error: Column `V3` can't be converted from integer to factor
In addition: Warning messages:
1: In bind_rows_(x, .id) : Unequal factor levels: coercing to character
2: In bind_rows_(x, .id) :
binding character and factor vector, coercing into character vector
3: In bind_rows_(x, .id) :
binding character and factor vector, coercing into character vector...
Results from other functions
#tidyverse
y<-mydf %>% unnest(data)
Error: Column `V3` can't be converted from integer to factor
In addition: Warning messages:
1: In bind_rows_(x, .id) : Unequal factor levels: coercing to character
y<-mydf %>%
+ unnest(data) %>%
+ group_by(files) %>%
+ mutate(
+ data = flatten_chr(data),
+ data_colname = str_c("data_", row_number())
+ ) %>% # or just as.character
+ spread(data_colname, data)
Error: Column `V3` can't be converted from integer to factor
In addition: Warning messages:
1: In bind_rows_(x, .id) : Unequal factor levels: coercing to character
Adding utilized function to pull .xlsx and all sheets in - as seen in this example:
library(tidyverse)
library(readxl)
dir_path1 <- "~/File1/Data/Qtr1"
dir_path2 <- "~/File1/Data/Qtr2"
dir_path3 <- "~/File1/Data/Qtr3"
dir_path4 <- "~/File1/Data/Qtr4"
re_file <- ".xlsx"
read_sheets <- function(dir_path1, file){
xlsx_file <- paste0(dir_path1, file)
xlsx_file %>%
excel_sheets() %>%
set_names() %>%
map_df(read_excel, path = xlsx_file, .id = 'sheet_name') %>%
mutate(file_name = file) %>%
select(file_name, sheet_name, everything())
}
df <- list.files(dir_path, re_file) %>%
map_df(~ read_sheets(dir_path, .))
Returns:
# A tibble: 15 x 5
file_name sheet_name col1
<chr> <chr> <dbl>
1 Q1_File1.xlsx Sheet1 1
2 Q1_File1.xlsx Sheet2 1
3 Q1_File2.xlsx Sheet1 1
...
However, unlike the sample data (as shown in the link), the data returned (col1) is a list of data frames.