0

I have a folder with numerous xlsx files that all need to be formatted in the exact same way. I want to read them into R and store them as lists that can be referenced using the xlsx file name so that I can feed it through my formatting code. This is the code that I found that labels them based on the iteration value in the for loop.

library("xlsx")
library("gdata")
library("rJava")


setwd("C:/Users/Owner/Desktop/FolderDatabase")
getwd()

files = list.files(pattern = "\\.xlsx")
#View(files)

dfList <- list()
for (i in seq_along(files)){
dfList[[paste0("excel",i)]] <- read.xlsx(files[i], sheetIndex = 1)
}


# Calling the xlsx lists that were created from the directory
dfList$excel1
dfList$excel2
dfList$excel3
dfList$excel4

If the xlsx file is named myname1.xlsx, I would like the list to be named myname1.

ryry
  • 86
  • 9

1 Answers1

1

Rather than initializing dfList as empty, try non-for approach:

dfList <- lapply( files, read.xlsx, sheetIndex = 1)
names(dfList) <- gsub("^.+/|\\.xlsx", "", files)

Or just:

dfList <- sapply( files, read.xlsx, sheetIndex = 1)

The first part of that two part pattern is in there because I usually wor with full file spec although in your case it's probably not needed. The second part of the "OR" ("|") is needed.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Isn't `sapply` needed here as `lapply` won't return the names of a vector? I.e. `sapply(files, I, simplify=FALSE)` vs `lapply(files, I)` – thelatemail Mar 30 '16 at 00:51
  • `lapply` was chosen because it is guaranteed to deliver a list. I don't think `sapply` and `lapply` handle naming of list differently, but I've been wrong about things before. I was passing a vector of file names to be read. I don't think `I` would be helpful. – IRTFM Mar 30 '16 at 01:45
  • `I` was just an example function - it seems `s/lapply` makes a difference - `sapply(c("one","two"), function(x) data.frame(a=1), simplify=FALSE)` vs. `lapply(c("one","two"), function(x) data.frame(a=1) )` – thelatemail Mar 30 '16 at 02:10
  • Thank you both for the help. I am new to R and programming in general and am wondering why it would be better to use the xlsx files as lists in R instead of assigning them as data frames. I have seen this warned against previously but don't understand why. The code I wrote uses data.table for reformatting all of the xlsx files so I would use the approach of assigning all xlsx files as data frames instead of lists. – ryry Mar 30 '16 at 02:39
  • Because then you can work on them as a group rather than repeating almost the same code over an over again. – IRTFM Mar 30 '16 at 03:23
  • @thelatemail: I see that you are right. Thanks for the lesson. – IRTFM Mar 30 '16 at 03:25