Newbie here. I have 1000 compressed CSV files that I need to read and row bind. My problem is similar to this one, but with two differences:
a) File names are of different lengths and not sequential, in this form:
"members_[name of company]_[state code].csv"`
I have two vectors, company
and states
with the required codes. So, I've built a vector of all the files I need with this code:
combinations <- expand.grid(company, states)
csvfiles <- paste0("members_" ,
combinations$Var1, "_",
combinations$Var2,".csv" )
so it has all the filenames I need (20 companies X 50 states). But I am lost as to how to cycle through all zip files. There are 10 other CSVs inside those zip files, but I only need the ones described above.
b) When decompressed, the files expand to a directory structure such as this:
/files/member_database/members/state/members_[name of company]_[state code].csv
but when I try to read the CSV from the zip file using
data <- read.csv(unz("members_GE_FL.zip", "members_GE_FL.csv"), header=F, sep=":")
it returns the 'cannot open connection' message. Adding the path such as ./files/member_database/members/state/members_GE_FL.csv
doesn't work either.
Then, I'm not sure if the command read.csv(unz(csvfiles...
would make it read the names in my csvfiles
, but I'm not sure if that's because of the above or if the command is wrong altogether.
Any help is appreciated -- insights, docs I should look at, etc. Again, I'm NOT trying to get people to do my work. As I type, I have 37 tabs open (many from SO), and have already spent 22 hours on this thing alone. I've learned this post and others how to read a file within a ZIP and from this post how to extract and import data. Still, I can't piece it all together. I've only started with R a few months ago, and have no prior experience as a programmer.