0

I've reviewed multiple StackOverflow questions and answers and still can't exclusively use R get a .zip file successfully downloaded, unzipped, and loaded in R.

When I download the .zip folder manually, I see that it contains multiple files, one named loan.csv, that I need to analyze in R.

#set wd
wd <- "/Users/myname/Documents/zip_folder"
setwd(wd)

zip_url <- "https://www.kaggle.com/wendykan/lending-club-loan-data/downloads/lending-club-loan-data.zip"

I'm getting an error with the first answer I found here:

library(utils)
temp <- tempfile()
download.file(zip_url, temp)
data <- read.table(unz(temp, "loan.csv"))
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") :
  cannot open zip file '/var/folders/b1/d481ykzd3j14kr8nkx8kn83m0000gn/T//RtmpcjmrIa/file932f730721c5'
unlink(temp)

Error in fread(unz(temp, "loan.csv")) : 
  'input' must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself

I'm also getting an error using the 5th answer (Mac specific) to the SO question hyperlinked above:

loans <- fread("curl https://www.kaggle.com/wendykan/lending-club-loan-data/downloads/lending-club-loan-data.zip | tar -xf- --to-stdout *loan.csv")

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                             Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100   149  100   149    0     0    334      0 --:--:-- --:--:-- --:--:--   334
tar: Unrecognized archive format
tar: *loans.csv: Not found in archive
tar: Error exit delayed from previous errors.

Error in fread("curl https://www.kaggle.com/wendykan/lending-club-loan-data/downloads/lending-club-loan-data.zip | tar -xf- --to-stdout *loans.csv") : 
  File is empty: /var/folders/b1/d481ykzd3j14kr8nkx8kn83m0000gn/T//RtmpcjmrIa/file932f299c7cc4
bshelt141
  • 1,183
  • 15
  • 31

1 Answers1

1

The multiple failures have various reasons:

  1. fread doesn’t work with unz. It does work with read.table.
  2. fread does work with more extensive shell commands, but you cannot untar a ZIP file because it’s not a TAR archive. You can use funzip, as suggested in the same answer (but only if your ZIP archive contains just a single file).

… you could also simply use the unzip R function.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • I edited my first attempt in the question and used `read.table()` instead of `fread()`, and included the new error message I'm receiving. Also, as mentioned in the beginning of the question, the `.zip` archive contains multiple files, so the `funzip` common will not work in this situation. – bshelt141 Aug 16 '17 at 16:09
  • @bshelt141 Right, the problem here is that the file you’re downloading is actually an HTML file not a ZIP file. You can’t access the URI directly — if I try, my browser redirects me to https://www.kaggle.com/wendykan/lending-club-loan-data. Apparently you need to login to download data sets. You can do this in R via the httr package. Warning: it’s *a lot* more complex. – Konrad Rudolph Aug 16 '17 at 16:15