0

So I have this dataset. Problem is, the data came in a zip file, which, once unzipped, contains 800+ folders. Each one of those folders contains half a dozen or so files, only one of which contains data I can use. And when I say the file contains useful data, I mean it contains one row of data. One observation. One datapoint. And I need a way to take all of those datapoints from each of their individual files and folders, and put them together in a dataframe.

I've figured out that R has functions for moving files around, like file.copy, and that I can sort files using list.files. However, I haven't been able to find a way to do the same with directories/folders. I know how to create new directories, and put files into them, but not how to get files out of more than one directory at a time.

  • `file.rename` works on a directory like it does on a file. See https://stackoverflow.com/a/24376207/3358227 for discussions on reading multiple files (and then optionally combining them with `data.table::rbindlist`, `dplyr::bind_rows`, or `do.call(rbind.data.frame, ...)`). – r2evans Jul 06 '23 at 16:40
  • Are the embedded files `.xlsx` or `.csv`? If the latter, then you can likely just `cat` (shell prompt, not R) the files into a single one, likely needing a `grep` (again, shell prompt, not R) to filter out repeated column headers. – r2evans Jul 06 '23 at 16:42

0 Answers0