0

I want to read all .csv files from a large zip file into r without unzipping the file. I understand how do to this for each file individually, as follows.

df1<- read_csv(unz("myfile.zip", "file1.csv"))
df2<- read_csv(unz("myfile.zip", "file2.csv"))

df3<- bind_rows(df1, df2)

How can I do this for all files in the zip folder and combine them into a dataframe? I would like to do something like this:

temp = list.files(path= "myfile.zip", pattern="*.csv", full.names = T)

myfiles = map_df(temp, read_csv)

The zip file I am interested in using is major agencies’ award transaction data for fiscal year 2020. The link to zip file is https://files.usaspending.gov/award_data_archive/FY2020_All_Contracts_Full_20221008.zip. The reason that I want to keep unzipped in my working directory is that it takes up too much room on my computer.

Mel G
  • 132
  • 1
  • 10

1 Answers1

1

Try this:

temp <- unzip("myfile.zip", list = TRUE)$Name
temp <- grep("csv$", temp, value = TRUE)
df <- bind_rows(lapply(temp, function(fn) read_csv(unz("myfile.zip", fn))))
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • thanks for the answer. Should I delete the question since you labeled it as a duplicate? I saw the other similar questions you referenced, but I still could not figure out how to do what I wanted, although perhaps I just did not fully see or understand the other posts – Mel G Nov 03 '22 at 14:11
  • (1) I didn't label it a dupe, M-- did. (2) Over to you. I almost dupe-hammered it, but I found it a trivial answer: you use `read_csv(unz(..))` which is the way to read without unzipping, the only thing you needed was `bind_rows(lapply(..))` (might also use `purrr::map_dfr`, I'm not a regular `purrr` user). Really, this question is about combining a list-of-frames which is likely a dupe elsewhere. Again, over to you. I see no harm in keeping it, but as a dupe it won't percolate to the top of search lists if retained. There are no downvotes (yet), so no harm. – r2evans Nov 03 '22 at 14:14
  • when I run the second line of code, I get: Error in is.factor(x) : object 'files' not found – Mel G Nov 03 '22 at 14:16