1

I have directory with a list of folders which contains a folder named "ABC" . This "ABC" has '.xlsm' files. I want to use a R code to read '.xlsm' files in the folder "ABC", which under different folders in a directory.

Thank you for your help

BVS MURTHY
  • 21
  • 5

3 Answers3

0

If you already know the paths to each file, then simply use read_excel from the readxl package:

library(readxl)
mydata <- read_excel("ABC/myfile.xlsm")

If you first need to get the paths to each file, you can use a system command (I'm on Ubuntu 18.04) to find all of the paths and store them in a vector. You can then import them one at a time:

myshellcommand <- "find /path/to/top/directory -path '*/ABC/*' -type d"
mypaths <- system(command = myshellcommand, intern = TRUE)
DanY
  • 5,920
  • 1
  • 13
  • 33
0

Because of your directory requirements, one method for finding all of the files can be a double list.files:

ld <- list.files(pattern="^ABC$", include.dirs=TRUE, recursive=TRUE, full.names=TRUE)
lf <- list.files(ld, pattern="\\.xlsm$", ignore.case=TRUE, recursive=TRUE, full.names=TRUE)

To read them all into a list (good ref for dealing with a list-of-frames: http://stackoverflow.com/a/24376207/3358272):

lstdf <- sapply(lf, read_excel, simplify=FALSE)

This defaults to opening the first sheet in each workbook. Other options in readxl::read_excel that might be useful: sheet=, range=, skip=, n_max=.

r2evans
  • 141,215
  • 6
  • 77
  • 149
0

Given a list of *.xlsm files in your working directory you can do the following:

list.files(
    path = getwd(),
    pattern = glob2rx(pattern = "*.xlsm"),
    full.names = TRUE,
    recursive = TRUE
) -> files_to_read

lst_dta <- lapply(
    X = files_to_read,
    FUN = function(x) {
        cat("Reading:", x, fill = TRUE)
        openxlsx::read.xlsx(xlsxFile = x)
    }
)

Results

Given two files, each with two columns A, B and C, D the generated list corresponds to:

>> lst_dta
[[1]]
  C D
1 3 4

[[2]]
  A B
1 1 2

Notes

  • This will read all .xlsm files found in the directory tree starting from getwd().
  • openxlsx is efficient due to the use of Rcpp. If you are going to be handling a substantial amount of MS Excel files this package is worth exploring, IMHO.

Edit

As pointed out by @r2evans in comments, you may want to read *.xlsm files that reside only within ABC folder ignoring *.xlsm files outside the ABC folder. You could filter your files vector in the following manner:

grep(pattern = "ABC", x = files_to_read, value = TRUE)

Unlikely, if you have *.xlsm files that have ABC string in names and are saved outside ABC folder you may get extra matches.

Konrad
  • 17,740
  • 16
  • 106
  • 167
  • Given the premise that only the XLSM files within "ABC" directories are of interest, this may identify extra files if other non-ABC-based files exist. – r2evans Aug 08 '18 at 16:23