Short version: I'm trying to combine a bunch of .csv files (each in a different subfolder of my working directory), and also add a column name indicating what subfolder they are from.
Longer: I'm trying to import a bunch of .csv files in directory subfolders that all are entitled "listings.csv", with subfolders that have the particular date those rental listings are for (for ex. Data/2022-04-12/listings.csv).
There is no existing date/label column in these .csv files (Edit: the only date info is the parent folder--which is a directory subfolder), so I'm hoping to import them, add the column name from their respective folder, and combine them so I can use them together, as a big dataset with a new "Date" column. (And alas, I did not create these files, or else I wouldn't be in this situation!) I know how to combine them, I know how to rename a singular file, and I know how to add a column to one file, but I'm hoping to batch-do this as I have ~20 subfolders. The csv file to folder is 1:1, and each csv file is called "listings.csv", so I was hoping this would be...relatively easy? (Alas.)
I figured out how to import the files from the subfolder, but cannot figure out how to automate the renaming of ~20 files from their respective folder.
For ex, this works, but then it joins without making dates (in the name of the subfolder) clear:
airbnb_listings <- list.files(path = "/Users/cevieth/Desktop/*MTL Rental Research/Rstats Coding/StatsCan and Air BnB Coding/Inside Air BnB Data",
pattern = "listings.csv",
recursive = TRUE,
full.names = TRUE) %>%
lapply(read_csv) %>%
reduce(full_join, by = "id")
And I can get a list of the subfolders (so, a list of dates) because I've reviewed this question (renaming file based on folder names with R), but it doesn't seem to exactly hit what I need--it involves filtering a particular prefixed folder,
subfolders <- list.dirs(path = "/Users/cevieth/Desktop/*MTL Rental Research/Rstats Coding/StatsCan and Air BnB Coding/Inside Air BnB Data", full.names = F, recursive = F)
And I can get a list of .csv file pathnames this way:
airbnb_files <- list.files(path = "/Users/cevieth/Desktop/*MTL Rental Research/Rstats Coding/StatsCan and Air BnB Coding/Inside Air BnB Data", pattern="*listings.csv", recursive = TRUE, full.names=TRUE)
airbnb_files
But I just cannot figure out how to end up with a tibble or df with a column that says the parent folder name (directory subfolder that has the listing.csv date), so I can compare data across different dates.
I think it's just brainfog getting to me (coding while sick, argh!)--but this should be possible, right?