1

Having trouble trying to read in multiple .xlsx files to R from the same directory. I keep getting the following error.

"Error in path.expand(file) : argument "file" is missing, with no default"

My code is as follows.

require(.xlsx)
Files=list.files(path="I:/Marcs_Discretinization_try_1/Attempt1/Actual     Data", pattern=".xlsx")
sapply(Files, read.xlsx2(sheetIndex=8))

The output of object Files looks like this which seemingly does not have the attached path.

 [1] "2015-B1-2OR.xlsx"    "2015-B1-OR10-B.xlsx" "2015-B1-OR10.xlsx"   "2015-B1-OR19.xlsx"   "2015-B2-OR19.xlsx"  
 [6] "2015-O1-2OR.xlsx"    "2015-O1-OR10-B.xlsx" "2015-O1-OR10.xlsx"   "2015-O2-2OR.xlsx"    "2015-O2-OR10-B.xlsx"
[11] "2015-O2-OR10.xlsx"   "2015-X1-2OR.xlsx"    "2015-X1-OR10-B.xlsx" "2015-X1-OR10.xlsx"   "2015-X2-2OR.xlsx"   
[16] "2015-X2-OR10-B.xlsx" "2015-X2-OR10.xlsx"  
MockCommunity1
  • 31
  • 1
  • 1
  • 10
  • If you want the full path returned, use `full.names = TRUE` in `list.files` – Ritchie Sacramento Nov 18 '15 at 03:40
  • This feels closer however the same error is returned: `Error in path.expand(file) : argument "file" is missing, with no default` – MockCommunity1 Nov 18 '15 at 03:47
  • Your sapply syntax is wrong, additional arguments to read.xlsx2 come after the name of the function. Try looking at some examples. – joran Nov 18 '15 at 03:50
  • also, your use of `sapply` is incorrect which is the source of the error. Try `sapply(Files, function(x) read.xlsx2(x, sheetIndex=8)` – Ritchie Sacramento Nov 18 '15 at 03:50
  • This helped @Jay. The data files are being read in as a single massive dataframe where each row is a vector as opposed to individual dataframes. – MockCommunity1 Nov 18 '15 at 04:09
  • Any idea on how to make them individual dataframes? – MockCommunity1 Nov 18 '15 at 04:33
  • @Dilliplaine33 Many would argue that it would be better to keep them as a list but if you want to assign individual data frames, see http://stackoverflow.com/a/11433532/2835261 – Ritchie Sacramento Nov 18 '15 at 04:47
  • Simpler: `sapply(Files, read.xlsx2, sheetIndex=8)`. For your last question, read `help(sapply)` and you may find your answer in `simplify`. – r2evans Nov 18 '15 at 04:51
  • @r2evans, even closer! simplify=False makes a more useful dataframe. I am now fumbling with this code to generate individual dataframes but am having errors. `for (i in 1:length(Files)[1]) {assign(paste0("DF", i))=read.xlsx2(Files[i],sheetIndex=1, header=T)}` _Error in assign(paste0("DF", i)) = read.xlsx2(Files[i], sheetIndex = 1, : target of assignment expands to non-language object_ – MockCommunity1 Nov 18 '15 at 05:33
  • Found it!! `for (i in 1:length(Files)[1]) + {assign(paste0("DF", i), read.xlsx2(Files[i],sheetIndex=1, header=T))}` does the trick. My problem was in the layout of assign. – MockCommunity1 Nov 18 '15 at 05:46
  • Though it may give you something with which you can work, there are several things about that method that are actively discouraged as well as unnecessary. @Jay suggested keeping them in one data.frame, and that may work better if you were to `cbind` a column indicating the different sources. Even without that, I am inferring that you will be doing similar things to each data.frame, so another option would be to use `simplify=FALSE` (which was my point earlier) and keep them unique in a list; that way, you can apply one function to all of them in one fell swoop. Also, why `length(Files)[1]`? – r2evans Nov 18 '15 at 06:12
  • @r2evans `length(Files)[1]` is an artifact of earlier code where I needed to treat something for each row hence the [1]. That may be redundant here. You are very correct in your thinking that I will be doing the same thing to each data frame so I have become very intrigued by simplify=FALSE. I have never accessed data in this fashion, how can a function be applied to all of them within themselves? – MockCommunity1 Nov 18 '15 at 06:35

2 Answers2

3

You want to merge all Excel files in a folder?

library(xlsx)
setwd("C:/Users/rshuell001/Desktop/excel_files")
data.files = list.files(pattern = "*.xlsx")
data <- lapply(data.files, function(x) read.xlsx(x, sheetIndex = 1))

for (i in data.files) {
    data <- rbind(data, read.xlsx(i, sheetIndex = 1))
}
ASH
  • 20,759
  • 19
  • 87
  • 200
2

I use a lapply and bind_rows usually. Comfortable and fast.

require(tidyverse)
require(magittr)
require(readxl)

path <- "I:/project/raw_data"
  
url_xlsx <- list.files(path, pattern = "*.xlsx", recursive = TRUE)
  
read_xlsx_files <- function(x){
   df <- read_xlsx(path = paste(path, x, sep = "/"))
   return(df)
 }
  
df <- lapply(url_xlsx, read_xlsx_files ) %>%
   bind_rows()
user2948714
  • 671
  • 2
  • 9
  • 13
ToWii
  • 590
  • 5
  • 8