0

Short version: I'm trying to combine a bunch of .csv files (each in a different subfolder of my working directory), and also add a column name indicating what subfolder they are from.

Longer: I'm trying to import a bunch of .csv files in directory subfolders that all are entitled "listings.csv", with subfolders that have the particular date those rental listings are for (for ex. Data/2022-04-12/listings.csv).

There is no existing date/label column in these .csv files (Edit: the only date info is the parent folder--which is a directory subfolder), so I'm hoping to import them, add the column name from their respective folder, and combine them so I can use them together, as a big dataset with a new "Date" column. (And alas, I did not create these files, or else I wouldn't be in this situation!) I know how to combine them, I know how to rename a singular file, and I know how to add a column to one file, but I'm hoping to batch-do this as I have ~20 subfolders. The csv file to folder is 1:1, and each csv file is called "listings.csv", so I was hoping this would be...relatively easy? (Alas.)

I figured out how to import the files from the subfolder, but cannot figure out how to automate the renaming of ~20 files from their respective folder.

For ex, this works, but then it joins without making dates (in the name of the subfolder) clear:

airbnb_listings <- list.files(path = "/Users/cevieth/Desktop/*MTL Rental Research/Rstats Coding/StatsCan and Air BnB Coding/Inside Air BnB Data", 
                       pattern = "listings.csv",
                       recursive = TRUE,
                       full.names = TRUE) %>% 
  lapply(read_csv) %>%  
  reduce(full_join, by = "id")                       

And I can get a list of the subfolders (so, a list of dates) because I've reviewed this question (renaming file based on folder names with R), but it doesn't seem to exactly hit what I need--it involves filtering a particular prefixed folder,

subfolders <- list.dirs(path = "/Users/cevieth/Desktop/*MTL Rental Research/Rstats Coding/StatsCan and Air BnB Coding/Inside Air BnB Data", full.names = F, recursive = F)

And I can get a list of .csv file pathnames this way:

airbnb_files <- list.files(path = "/Users/cevieth/Desktop/*MTL Rental Research/Rstats Coding/StatsCan and Air BnB Coding/Inside Air BnB Data", pattern="*listings.csv", recursive = TRUE, full.names=TRUE)
airbnb_files

But I just cannot figure out how to end up with a tibble or df with a column that says the parent folder name (directory subfolder that has the listing.csv date), so I can compare data across different dates.

I think it's just brainfog getting to me (coding while sick, argh!)--but this should be possible, right?

Rine
  • 1
  • 2
  • Does this do what you want: https://stackoverflow.com/questions/63072429/merging-csv-files-in-r-and-adding-a-column-with-file-names – MrFlick Nov 15 '22 at 20:28
  • @MrFlick: Unfortunately, it's what I'm aiming to do, but doesn't seem to be exactly my situation. (Unless I'm missing something, which is VERY possible!) I think I need to first batch rename a bunch of identically-named files ("listing.csv") from subfolders (not all in the same folder like that question or this question: https://stackoverflow.com/questions/11433432/how-to-import-multiple-csv-files-at-once), add column with the new filename (directory subfolder name that that listings.csv is in), and merge--though I could be wrong on steps. (Thank you so much for the quick response!!!) – Rine Nov 15 '22 at 20:37
  • Specially look at this answer https://stackoverflow.com/a/66295120/2372064. Keep your `recursive=TRUE` in your list files. It doesn't really matter if the files are in the same directory or not if you have the path. Use `dirname()` rather than `basename()` to get the directory. No need to rename the file. – MrFlick Nov 15 '22 at 20:52
  • Thank you so much! That makes sense. I am now getting another error message: ` Error in `dplyr::bind_rows()`: ! Can't combine `price` and `price` . Backtrace: 1. files %>% map_dfr(read_csv, .id = "filename") 2. purrr::map_dfr(., read_csv, .id = "filename") 3. dplyr::bind_rows(res, .id = .id) 6. vctrs::vec_rbind(!!!dots, .names_to = .id) ` I'm hoping I can specify a column with a particular name on multiple sheets to be numeric...? – Rine Nov 16 '22 at 00:37

0 Answers0