The accepted answer provides a sufficient solution.
However, if the number of files that are required to read is larger, it might be worth to avoid for loops altogether and rewrite the solution using lapply
.
Few additional tweaks could be added:
- You could use
data.table
rbindlist
to bind each file into a single big data.frame
/ data.table
(to avoid multiple data.frames in global environment). This assumes that all loaded data share the structure - this is more concise. This can be also achieved in base
with do.call(rbind, "yourlist")
but you will loose the "id" column.
- You can implement some sort of percentage counter to show how many files were loaded using
cat(round(i / length(test) * 100, 3), "% \r")
which is also noted here.
- There's number of packages that can be used to do this task. I would recommend
openxlsx
and avoid the xlsx
. The openxlsx
removes the dependency on Java. There's also readxl
which is worth mentioning. The solution below is using openxlsx
The following solution should achieve similar results but should be faster than a for loop. As no data example was provided, you will have to play with some of the arguments in openxlsx::read.xlsx
to specify the range of the sheet you want to load.
test <- list.files(recursive = TRUE, pattern = "*.xlsx")
as.data.frame(data.table::rbindlist(lapply(1:length(test), function(i){
cat(round(i / length(test) * 100, 3), "% \r")
tmp_name <- test[i]
tmp_sheets <- openxlsx::getSheetNames(tmp_name)
if ("Income Statement" %in% tmp_sheets) {
tmp_data <- openxlsx::read.xlsx(
tmp_name,
sheet = "Income Statement",
startRow = 4)
assign(x = tmp_name, value = tmp_data)
}
}
), idcol = "id"))