1

I'm writing a loop script which involves reading a file from a workbook (using the package XLConnect). The challenge is that the file names contain characters (representing time) that I want to ignore.

For example, here are 3 paths to those files:

G://User//Documents//daily_data//Op_Schedule_20160520_132025.xlsx G://User//Documents//daily_data//Op_Schedule_20160521_142805.xlsx G://User//Documents//daily_data//Op_Schedule_20160522_103052.xlsx

I need to import hundreds of those files. I can easily account for the character string representing the date (e.g. 20160522), but not the time.

Is there a way to tell R to ignore some characters located in the file path? Here is how I was thinking of writing my script (the "???" is where i need help). I know a loop is probably not the most efficient way, but i'm open to suggestions, should you have any:

require(XLConnect)
path= "G://User//Documents//daily_data//Op_Schedule_"

wd.seq = format(seq(as.Date("2014-01-01"),as.Date("2016-12-31"),"days"),format="%Y%m%d")

scheduleList = rep(list(matrix(1,1,1)),length(wd.seq))

for(i in 1:length(wd.seq)) { 
wb = loadWorkbook(file= paste0(path,wd.seq[i],"???",".xlxs"))
scheduleList[[i]] = readWorksheet(wb,sheet='=SCHEDULE', header = TRUE)
}

`

Thanks for reading and suggestions, if any.

Mathieu

Mathieu
  • 67
  • 8

1 Answers1

1

I don't know if this is helpful, but if you want to read all the files in a certain directory (which it seems to me is what you're after), you can read all the filenames into a list using the list.files() function, for example

fileList <- list.files(""G://User//Documents//daily_data//")

And then load the xlsx files looping through the list with a for loop

for(i in fileList) {
    loadWorkbook(file = i)
    }

I haven't used the XLConnect function before so that exact code probably doesn't work, but the loop will iterate through all the files in that directory and so you can construct your loading call using the i variable for the filename (it won't be an absolute path though, so you might need to use paste to add the first part of the filepath)

I realize there might be other files in the directory that are not excel files, you could use grepl to select only files containg "OP_Schedule_"

fileListClean <- fileList[grepl("Op_Schedule_",fileList)]

or perhaps only selecting .xlsx files in the directory:

fileListClean <- fileList[grepl(".xlsx",fileList)]

Edit to fit your reply: Since you need to fit it to a sequence, you can do it as you did earlier:

wd.seq = format(seq(as.Date("2014-01-01"),as.Date("2016-12-31"),"days"),format="%Y%m%d")
wd.seq2 <- paste("Op_Schedule_", wd.seq, sep = "")

And then use grepl to only pick files starting with that extensions:

fileListClean <- fileList[grepl(paste(wd.seq2, collapse = "|"), fileList)]

Full disclosure: The last part i got from this SO answer: grep using a character vector with multiple patterns

Community
  • 1
  • 1
  • Takk fyrir Kari, it would have been a good idea if I needed all the files in the directory... Unfortunately, the script runs everyday and the time period needed (e.g. 100 days) rolls in time, so I don't want to pull all the files. – Mathieu May 31 '16 at 14:37
  • Hey Mathieu, might not be the most elegant solution, but you can check out the edited answer, should do what you wanted to. – Kári Gunnarsson Jun 01 '16 at 14:18
  • Sorry for the delayed reply Kari. It works great! I love this grepl function; I was not aware of it. For other users who might find this post useful: you have to add the start of the path before the fileList: – Mathieu Jun 08 '16 at 17:56
  • fileList = paste0("G://User//Documents//daily_data//", list.files(""G://User//Documents//daily_data//")) – Mathieu Jun 08 '16 at 17:57