3

I'm fairly new to R, so my apologies if this is a very basic question.

I'm trying to read two Excel files in, using the list.files(pattern) method, then using a for loop to bind the files and replace values in the bound file. However, the output that my script is producing is the output from only one file, meaning that it is not binding.

The file names are fact_import_2020 and fact_import_20182019.

FilePath <- "//srdceld2/project2/"
FileNames <- list.files(path = FilePath, pattern = "fact_import_20", all.files = FALSE,
                        full.names = FALSE, recursive = FALSE,
                        ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)

FileCount <- length(FileNames)

for(i in 1:FileCount){
  MOH_TotalHC_1 <- read_excel(paste(FilePath, "/", FileNames[i], sep = ""), sheet = 1, range = cell_cols("A:I"))
  MOH_TotalHC_2 <- read_excel(paste(FilePath, "/", FileNames[i], sep = ""), sheet = 1, range = cell_cols("A:I"))
  MOH_TotalHC <- rbind(MOH_TotalHC_1, MOH_TotalHC_2)
  MOH_TotalHC <- MOH_TotalHC[complete.cases(MOH_TotalHC), ]
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • FYI, in SE [formatting](https://stackoverflow.com/editing-help), code blocks (1) need just the three backticks on a line, no code on that line; (2) they benefit from adding a language on that line of text, `lang-r` here; (3) the close-fence (at the end of code) also needs to be all on one line, nothing before or after it. It really helps when your code *looks* intuitive and separated, even if mostly a stylistic thing. Thanks for starting with what you did, though your last edit just wiped out my fixes ... see my recent edit for further suggestion. – r2evans Apr 06 '20 at 21:12
  • ... but it helps to have self-sufficient code, and this is not. At a minimum, you're missing a close-brace `}` in your `for` loop. Also, is there a reason you explicitly do `full.names=FALSE` and then `paste(FilePath,"/",FileNames[i],sep="")`? That's completely unnecessary (as Sathish's answer is suggesting). – r2evans Apr 06 '20 at 21:15

2 Answers2

3

use full.names = TRUE in list.files().

After this, make sure FileNames has full path of the files.

Then loop through the filenames, instead of filecount.


I think, you are trying to do this. I am guessing here. Please see below.

You are getting data from one file, because you are overwriting the data from file-2 with data from file-1. The for() loop is indicating it.

FileNames <- list.files(path = FilePath, pattern = "fact_import_20", all.files = FALSE,
                        full.names = TRUE, recursive = FALSE,
                        ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)

# list of data from excell files
df_lst <- lapply(FileNames, function(fn){
  read_excel(fn, sheet = 1, range = cell_cols("A:I"))
})

# combine both data
MOH_TotalHC <- do.call('rbind', df_lst)
# complete cases
MOH_TotalHC[complete.cases(MOH_TotalHC), ]
Sathish
  • 12,453
  • 3
  • 41
  • 59
  • Thanks for your answer! I tried this and the output still had the same issue. I'm repeating the same operation in MOH_TotalHC_2 to read in the second Excel file. I tried changing the column range, but it won't let me rbind if the column ranges are different. –  Apr 06 '20 at 21:39
0

The potential solution is below. This solution is taken from here and seems like a duplicate question.

Potential solution:

library(readxl)
library(data.table)

#Set your path here
FilePath <- "//srdceld2/project2/"

#Update the pattern to suit your needs. Currently, its just set for XLSX files
file.list <- list.files(path = FilePath, pattern = "*.xlsx", full.names = T)
df.list <- lapply(file.list, read_excel, sheet = 1, range = cell_cols("a:i"))
attr(df.list, "names") <- file.list
names(df.list) <- file.list
setattr(df.list, "names", file.list)

#final data frame is here
dfFinal <- rbindlist(df.list, use.names = TRUE, fill = TRUE)

Assumptions and call outs:

  1. The files in the folder are similar file types. For example xlsx.
  2. The files could have different set of columns and NULLs as well.
  3. Note that the order of the columns matter and so if there are more columns in new file the number of output columns could be different.

Note: Like @Sathish, I am guessing what the input could look like