I have directory with a list of folders which contains a folder named "ABC" . This "ABC" has '.xlsm' files. I want to use a R code to read '.xlsm' files in the folder "ABC", which under different folders in a directory.
Thank you for your help
I have directory with a list of folders which contains a folder named "ABC" . This "ABC" has '.xlsm' files. I want to use a R code to read '.xlsm' files in the folder "ABC", which under different folders in a directory.
Thank you for your help
If you already know the paths to each file, then simply use read_excel
from the readxl
package:
library(readxl)
mydata <- read_excel("ABC/myfile.xlsm")
If you first need to get the paths to each file, you can use a system command (I'm on Ubuntu 18.04) to find all of the paths and store them in a vector. You can then import them one at a time:
myshellcommand <- "find /path/to/top/directory -path '*/ABC/*' -type d"
mypaths <- system(command = myshellcommand, intern = TRUE)
Because of your directory requirements, one method for finding all of the files can be a double list.files
:
ld <- list.files(pattern="^ABC$", include.dirs=TRUE, recursive=TRUE, full.names=TRUE)
lf <- list.files(ld, pattern="\\.xlsm$", ignore.case=TRUE, recursive=TRUE, full.names=TRUE)
To read them all into a list (good ref for dealing with a list-of-frames: http://stackoverflow.com/a/24376207/3358272):
lstdf <- sapply(lf, read_excel, simplify=FALSE)
This defaults to opening the first sheet in each workbook. Other options in readxl::read_excel
that might be useful: sheet=
, range=
, skip=
, n_max=
.
Given a list of *.xlsm
files in your working directory you can do the following:
list.files(
path = getwd(),
pattern = glob2rx(pattern = "*.xlsm"),
full.names = TRUE,
recursive = TRUE
) -> files_to_read
lst_dta <- lapply(
X = files_to_read,
FUN = function(x) {
cat("Reading:", x, fill = TRUE)
openxlsx::read.xlsx(xlsxFile = x)
}
)
Given two files, each with two columns A
, B
and C
, D
the generated list corresponds to:
>> lst_dta
[[1]]
C D
1 3 4
[[2]]
A B
1 1 2
.xlsm
files found in the directory tree starting from getwd()
.openxlsx
is efficient due to the use of Rcpp. If you are going to be handling a substantial amount of MS Excel files this package is worth exploring, IMHO.As pointed out by @r2evans in comments, you may want to read *.xlsm
files that reside only within ABC
folder ignoring *.xlsm
files outside the ABC
folder. You could filter your files vector in the following manner:
grep(pattern = "ABC", x = files_to_read, value = TRUE)
Unlikely, if you have *.xlsm
files that have ABC
string in names and are saved outside ABC
folder you may get extra matches.