0

I have a directory of thousands of CSV files which, fortunately, follow a strict naming convention. I am trying to write a function that groups into separate data frames all of the files that end with the same last 7 digits.

I have a vector (u) of the 7 digit patterns to match:

v <- list.files(wd, full.names = FALSE)
u <- unique(substr(v, 9, 15))

Now I need to run each element of vector u against each file in list v, and combine all the matching files in v into a single data data frame for each value of u.

I've tried a few things with no success:

#only matches first in list
files <- list.files(pattern=u)

#makes a list of vectors with the same contents
lapply(v, function(x) list.files(pattern=u)) 

#nope
data <- data.frame()
  for (i in 1:length(u)) {
    data <- rbind(data, read.csv(v[files]))
    }

A nudge or shove in the in the right direction would be greatly appreciated.

Thanks!

Phoebe
  • 287
  • 4
  • 16

1 Answers1

2

Nested calls to lapply should do it. The first call to lapply loops through the unique patterns (v). For each pattern, the second lapply loops through all matching files (list.files(pattern=pattern)), read the files in (read.table) and then bind them together into a single data.frame with bind_rows from the dplyr package (you can also use rbind, but I find bind_rows simpler) and return that to the outer lapply.

The result should be a list of data.frames, each of which contains the merged contents of all .csv files that matched a 7 digit pattern.

list_of_file_sets <- lapply(v, function(pattern) {
    file_set <- lapply(list.files(pattern=pattern), function(file) {
        read.table(file, sep=',', header=T, stringsAsFactors=F)
    })
    file_set <- dplyr::bind_rows(file_set)
})
names(list_of_file_sets) <- v # Optionally set names of list to 7 digit pattern
divibisan
  • 11,659
  • 11
  • 40
  • 58
  • Thanks so much for the reply, divibisan! Your description sounds exactly like what I want to do, but I have not yet been able to make it work. My vector of unique patterns is U and my vector of files is V. Also, I'm getting an "unexpected }" error near file_set. Any thoughts? – Phoebe Aug 17 '18 at 17:23
  • Oops, missing a close parenthesis. Try again – divibisan Aug 17 '18 at 17:27
  • So close! Current error is "cannot allocate vector of size 122 Kb," I will see if I can work my way around this. – Phoebe Aug 17 '18 at 19:44
  • Take a look at [this question that discusses that error](https://stackoverflow.com/questions/5171593/r-memory-management-cannot-allocate-vector-of-size-n-mb). Thats strange, because that error usually happens with large objects (hundreds of Mb to Gb) that fill up the computer's memory. You might just want to restart your computer and see if that helps – divibisan Aug 17 '18 at 19:47
  • Indeed, it's a few Gb. I can thin out half the files and try again. If that doesn't work I will proceed in batches. Will report back and mark your answer accordingly. Thank you so much for your help! – Phoebe Aug 17 '18 at 19:54