0

I've used a lot of posts to get me this far (such as here R list files with multiple conditions and here How can I read multiple files from multiple directories into R for processing? but can't accomplish what I need in R.

I have many .csv files distributed in multiple subdirectories that I want to read in and then save as separate objects to the corresponding basename. The end result will be to rbind each of those files together. Here's sample dir structure and some of what I've tried:

./DATA/Cat_Animal/animal1.csv
./DATA/Dog_Animal/animal2.csv
./DATA/Dog_Animal/animal3.csv
./DATA/Dog_Animal/animal3.1.csv

#read in all csv files
files <- list.files(path="./DATA", pattern="*.csv", full.names=TRUE, recursive=TRUE)

But this results in all files in all subdirectories. I want to match specific files (animalsX.csv) in specific subdirectories matching the pattern (X_Animal) such as this:

files <- dir(path=paste0("./DATA/", pattern="*+_Animal"), recursive=TRUE, full.names=TRUE, pattern="animal+.*csv")

Once I get my list of files, I want to read each of them in and save each to the corresponding file's basename. So the file named animal1.csv would be saved to animal1. I think I need to use the function basename() somewhere in a loop but not sure how.

Help very much appreciated I've spent a lot of time trying out various options with little progress.

KNN
  • 459
  • 4
  • 19

3 Answers3

0

This question is really two questions, consider splitting them up. On the last part of your question, how to rbind a list full of data.frames together try:

finalDf = do.call(rbind, result)

You'll likely need to use str_split() from the stringr package to extract the parts of the file path you need. You could also use str_extract() regular expressions.

David Pedack
  • 482
  • 2
  • 10
  • Okay, I'll delete the second half of the question so edit to focus on looping over/reading in files and saving to their corresponding basename. – KNN Sep 23 '19 at 18:12
0

I think I found a work-around for the short term because luckily I only have a few subdirectories currently.

myFiles1 <- list.files(path = "./DATA/Cat_Animal/", pattern="animal+.*csv")

processFile <- function(f) {
  df <- read.csv(file = paste0("./DATA/Cat_Animal/", f ))
}
result1 <- sapply(myFiles1, processFile)

#then do it again for the next subdir:
myFiles2 <- list.files(path = "./DATA/Dog_Animal/", pattern="animal+.*csv")

processFile <- function(f) {
  df <- read.csv(file = paste0("./DATA/Dog_Animal/", f ))
}
result2 <- sapply(myFiles2, processFile)

finalDf = do.call(rbind, result1, result2)

I know there is a better way but can't figure out the pattern matching for the subdirectories! It's so easy in unix for example

KNN
  • 459
  • 4
  • 19
  • are you using regular expressions in unix? if so, `str_extract()` is probably exactly what you are looking for. – David Pedack Sep 23 '19 at 20:41
  • Thanks. In unix the path to read all matching files in subdirectories would simply be: ./DATA/*_Animal/animal*.csv. Based on str_extract() documentation and it's not that easy and unclear how to add into list.files() but will keep trying – KNN Sep 24 '19 at 01:07
0

You can simply do it two times.

a <- list.files(path="./DATA", pattern="*_Animal", full.names=T, recursive=F)
a
#[1] "./DATA/Cat_Animal" "./DATA/Dog_Animal"

files <- list.files(path=a, pattern="*animal*", full.names=T)
files
#[1] "./DATA/Cat_Animal/animal1.txt" "./DATA/Dog_Animal/animal2.txt" #"./DATA/Dog_Animal/animal3.txt"
#[4] "./DATA/Dog_Animal/animal4.txt"

In the first step, please make sure to use full.names = T and recursive = F. You need full.names = T to get the file path not just file name, otherwise you might lose path to animal*.csv in the second step. And recursive = T would return nothing since Dog_Animal and Cat_Animal are folders not files.

patL
  • 2,259
  • 1
  • 17
  • 38